Modern software systems often rely upon large databases. When an issue arises within a customer installation of such a software system, the software developer attempts to replicate the issue within a test environment as a first step in fixing the issue. To do so, the software developer must have a representative dataset for the production database used by the customer. In many cases, the actual production database is so large that copying the database to the test environment is not feasible. Further, many production databases are used within industries in which strict legal requirements prevent the sharing of data.
In consequence, the software developer must generate a synthetic dataset that is representative of the dataset of the production database in which the issue occurred. The dataset must be generated from a limited set of known database artifacts relating to the production database.