writexl

Anonymizing Data and Creating Fake Data

Intro A colleague of mine asked whether I had a way to anonymize distribution data that we can get from Teosto’s web service. Since the data contains a lot of sensitive information, something needs to be done in order to protect the privacy of everyone involved, if we want to demonstrate it to a customer or a stakeholder. Also, not that long ago, I happened to see this blog post by Khuyen Tran (Data Science Simplified) about creating fake data in Python.

Mining Text from PDF Files, Part 3: PDF with an Image

Intro I wanted to find out how to mine text from PDF files with R. I’m experimenting with different formats, the previous ones having been text and tables. In this last one I will extract text that’s in an image that’s inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment). I’m sure most of this can be done with using something else as well.

Mining Text from PDF Files, Part 2: PDF with Tables

Intro I wanted to find out how to mine text from PDF files with R. Last week I tried to extract text from a PDF file with just text in it. This week I will try extracting text from a PDF file with a table. Next week, I will try it from a picture inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment). I’m sure most of this can be done with using something else as well.

Mining Text from PDF Files, Part 1: PDF with Text

Intro I wanted to find out how to mine text from PDF files with R. I’m experimenting with different formats, which will each have their own blog post. This first one is about PDF files with just text in them. The second one will be about extracting text in tables and in the third one I will extract text that’s in a picture inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment).