The ability to quickly profile and build large data sets both for analysis and testing (e.g., scale, performance, functional, etc.) may be beneficial to quickly delivering stable enterprise products. It may be necessary to automate the generation of very large datasets, since real-world datasets of that size may not be available. To obtain realistic measurements from testing, the randomly generated data may be required to have many of the same characteristics as real-world data. For example, realistic numbers of resources of each type, realistic linkages between those resources, realistic values for numeric fields, realistic size and content for string fields, etc.
To this end, data generators typically have a great deal of domain specific knowledge built in. For instance, some data generators may know about the different types of resources that may need to be created and they may know how to generate reasonable values for each of the fields that describe that resource. Generally, this means that different data generators may be needed for different applications, and those generators may require updates whenever the “shape” of an application's data changes (e.g., new fields are added, new resource types are added, the expected values for a field change, etc).
Other generators may allow more flexibility, but may require users to learn a new language used for specifying data generation rules. The shape of the data may be required to be described by a set of data generation rules and those rules may also require updates whenever the shape of the desired dataset changes. This may also require that the user creating the rules have an in-depth knowledge of the underlying resource description framework (RDF) representations created by the application the user wishes to test.