Generate Realistic Test Data

Test environments should match production environments as closely as possible. So if you’re developing something that will interact with millions of records in production, your test environment should contain millions of records. Doing so will help you spot portions of your application that do not scale well, like loops that sprint through lists of one thousand but crawl through lists of one hundred thousand.

If you’re starting from scratch without seed data, populating your test environment with dummy data can be a pain. Data sets that match your model’s schema do not always exist, and if they do they may not be large enough. Thankfully, the Faker API exists.

The Faker API is a microservice built by marak and hosted on hook.io and is capable of generating “massive amounts of data”. It provides sixteen categories of API methods, spanning from addresses and persons to images and lorem ipsum. The API also supports multiple localities, including Spanish, Russian, and Simplified Chinese.

To demonstrate how the Faker API can be used, we’ve built easyfake. Simply fill out the form with the field names of your table and their corresponding “Faker types”, enter how many records you want, then click Generate to download your sample data in CSV format. We also output your faker configuration in JSON format using JSON Schema Faker (jsf). You can copy this configuration and use it in the future with jsf.

After you give it a try, let us know how it goes! We’d love feedback.