Currently, we are in the era of big data. There is huge volume of data in many industries, such as transportation, electrical power, etc., and more applications continue to be developed with respect to such industrial data. During development, a large volume of testing data is needed for conducting functional tests on the applications. In practice, however, users typically can only provide a small amount of real sample data, and a developer often suffers from absence of real data.
Existing approaches for generating testing data typically take values randomly based on requirements such as value range, data type, etc. However, these approaches may only consider factors such as even distribution of data, comprehensive coverage, etc., and may not reflect complicated correlations or patterns of real physical data per se. For example, testing data may include fields for a staff number and age, where the staff number needs to be a unique integer value, and the age needs to be an integer value between 20-60. When generating 1000 pieces of record, the staff number may be randomly generated in an interval of 1-1000 and the age may be randomly generated in an interval of 20-60. However, such a data generation method does not support generation of data having complicated patterns or correlations. If new testing data is also generated randomly, the generated new testing data may be made impractical, such that it can not be applied in testing of an application.