1. Field of the Invention
Embodiments of the invention described herein pertain to the field of computer systems. More particularly, but not by way of limitation, one or more embodiments of the invention enable a data generator apparatus for testing data dependent applications, verifying schemas and sizing systems.
2. Description of the Related Art
Data dependent applications require data to operate on. The data cannot be completely random but rather, must be valid as expected by the business logic of a data dependent application. The simplest method for testing a data dependent application such as a database application is to manually enter data into a database and test the data dependent application with that data. Manually entering data is a laborious process when attempting to provide enough data to fully test a complex enterprise level database application. For complex applications that involve hundreds of tables and fields and millions of records, it is not practical to manually populate a database with test data due to the sheer magnitude labor required. In addition, the accuracy rate for manual data entry is an issue. For example, if 40 percent of the records in a table in a database are to utilize a post office box, it is difficult to ensure that manual entry of the data will achieve this. To avoid manual typing of data, data generators have been employed to seed a database with data.
Data generators are utilized to automate the process of filling data structures such as databases with data. Not only are data generators used for testing data dependent applications. Use of a data generator greatly increases the amount of data that may be generated in a given amount of time versus manual entry. There are currently no known database generators that provide a full range of data generation for complex database schema verification, system hardware sizing and functional test of data dependent applications.
Regardless of the method used to populate a database, i.e., whether manual or automated, a data model design or schema, cannot be considered valid without populating and testing the schema with valid data. A data dependent application similarly cannot be expected to work properly unless tested on data that is representative of the data that will be processed when the application is introduced. There are no known data generators that randomize fields while keeping the characteristics of the fields within the profile of an operational database without directly copying the data in the operational database.
Likewise, system hardware procurement requires accurate system sizing to determine the amount of hardware required to handle a particular database application. Current system sizing estimates are generally performed in a rudimentary manner since they do not utilize data that is representative of the data that will eventually inhabit a database. Since only poor estimates of ultimate system performance are possible, hardware purchases are generally larger than needed to accommodate the poor hardware estimates. No known data generator addresses all of the verification, sizing and testing issues as described herein.
There are no known data generators that take a holistic approach to a database as a whole and populate the database in an intelligent manner. Known data generators take an individual table approach to the data generation process. For example, there are no known data generators that can handle complex schemas and support average, maximum and/or fixed percentages of values or value types including qualifiers and multi-valued fields as observed or profiled in an operational database. Current data generators fail to mimic values and sizes and percentages of values and complex data structures in an operational database and hence are incapable of being used to validate a schema, calculate system hardware requirements and fully test data dependent applications.
DTM Data Generator is one data generator that allows for generating values for a database, but does not allow for entry of cardinality, fill rate, maximum, nominal and average values, or complex data types such as qualifiers. In addition, the tool does not allow for profiling information to mimic the size and format of data in an operational database. Rather, the tool is silent on the use of qualifiers and uses an external database verbatim for filling fields. Furthermore, there is no disclosure of the filling of multi-value fields that hold more than one value for a particular field.
EMS Data Generator and GSApps Data Generation Tool likewise have similar limitations and do not address all facets of schema verification, data dependent application testing and system sizing. For example, neither tool allows for specification of maximum, nominal and average values for data fields or the filling of complex data types such as qualifiers. These tools are further examples of tools that are designed to place raw data into a database without regards to the full range of schema validation or data dependent application testing or system hardware sizing.
For at least the limitations described above there is a need for a data generator for database schema verification, system sizing and functional test of data dependent applications.