pdftools

Mining Text from PDF Files, Part 3: PDF with an Image

Intro I wanted to find out how to mine text from PDF files with R. I’m experimenting with different formats, the previous ones having been text and tables. In this last one I will extract text that’s in an image that’s inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment). I’m sure most of this can be done with using something else as well.

Mining Text from PDF Files, Part 1: PDF with Text

Intro I wanted to find out how to mine text from PDF files with R. I’m experimenting with different formats, which will each have their own blog post. This first one is about PDF files with just text in them. The second one will be about extracting text in tables and in the third one I will extract text that’s in a picture inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment).