Anonymizing Data and Creating Fake Data - Localize It!

Intro The previous blog post was all about creating fake data and anonymizing existing data. In it I mentioned a fun alternative to charlatan for creating fake people names for the Finnish market. Let’s get to it! First, let’s see where the data can be found. Then we’ll put together a simple random Finnish name generator. Finally, we can utilize it to anonymize the names from the data set we used last time.

Intro A colleague of mine asked whether I had a way to anonymize distribution data that we can get from Teosto’s web service. Since the data contains a lot of sensitive information, something needs to be done in order to protect the privacy of everyone involved, if we want to demonstrate it to a customer or a stakeholder. Also, not that long ago, I happened to see this blog post by Khuyen Tran (Data Science Simplified) about creating fake data in Python.