How to [insert topic here] with R: Using the OpenAI API

Intro The previous blog post came out over a year ago! I’m sorry I haven’t written here more often, but that’s about to change. While I haven’t been as active on this site, I’ve been more active on LinkedIn and GitHub. That has now sparked the need to write in a longer format again. Anyway, let’s get on with it! I promised in the last post to write about APIs next, didn’t I?

Anonymizing Data and Creating Fake Data - Localize It!

Intro The previous blog post was all about creating fake data and anonymizing existing data. In it I mentioned a fun alternative to charlatan for creating fake people names for the Finnish market. Let’s get to it! First, let’s see where the data can be found. Then we’ll put together a simple random Finnish name generator. Finally, we can utilize it to anonymize the names from the data set we used last time.

Anonymizing Data and Creating Fake Data

Intro A colleague of mine asked whether I had a way to anonymize distribution data that we can get from Teosto’s web service. Since the data contains a lot of sensitive information, something needs to be done in order to protect the privacy of everyone involved, if we want to demonstrate it to a customer or a stakeholder. Also, not that long ago, I happened to see this blog post by Khuyen Tran (Data Science Simplified) about creating fake data in Python.

Mining Text from PDF Files, Part 3: PDF with an Image

Intro I wanted to find out how to mine text from PDF files with R. I’m experimenting with different formats, the previous ones having been text and tables. In this last one I will extract text that’s in an image that’s inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment). I’m sure most of this can be done with using something else as well.

Mining Text from PDF Files, Part 2: PDF with Tables

Intro I wanted to find out how to mine text from PDF files with R. Last week I tried to extract text from a PDF file with just text in it. This week I will try extracting text from a PDF file with a table. Next week, I will try it from a picture inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment). I’m sure most of this can be done with using something else as well.

Mining Text from PDF Files, Part 1: PDF with Text

Intro I wanted to find out how to mine text from PDF files with R. I’m experimenting with different formats, which will each have their own blog post. This first one is about PDF files with just text in them. The second one will be about extracting text in tables and in the third one I will extract text that’s in a picture inside a PDF file. I’m assuming you’re using RStudio as your IDE (Integrated Development Environment).