Generate Realistic Test Data

Generate Realistic Test Data

Test environments should match production environments as closely as possible. So if you’re developing something that will interact with millions of records in production, your test environment should contain millions of records. Doing so will help you spot portions of your application that do not scale well, like loops that sprint through lists of one thousand but crawl through lists of one hundred thousand.

If you’re starting from scratch without seed data, populating your test environment with dummy data can be a pain. Data sets that match your model’s schema do not always exist, and if they do they may not be large enough. Thankfully, the Faker API exists.

The Faker API is a microservice built by marak and hosted on hook.io and is capable of generating “massive amounts of data”. It provides sixteen categories of API methods, spanning from addresses and persons to images and lorem ipsum. The API also supports multiple localities, including Spanish, Russian, and Simplified Chinese.

To demonstrate how the Faker API can be used, we’ve built easyfake. Simply fill out the form with the field names of your table and their corresponding “Faker types”, enter how many records you want, then click Generate to download your sample data in CSV format. We also output your faker configuration in JSON format using JSON Schema Faker (jsf). You can copy this configuration and use it in the future with jsf.

After you give it a try, let us know how it goes! We’d love feedback.

By Ian Harris

1 comment so far

Ian Harris

Posted on 10:46 AM - October 1, 2018

We’ve noticed there are some limitations to the randomness of data produced by faker.js/jsf, at least as implemented in easyfake. If you’re generating upwards of a million records using easyfake, note that faker types like uuid may have the occasional duplicate.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.