The present invention relates to electronic data processing, and more specifically concerns automated testing of database management systems.
Relational database management systems (DBMS), such as Microsoft SQL Server, interpret statements written in a database query language such as Structured Query Language (SQL) to create and manage database objects, to insert and update data, and to perform complex, multilevel queries against huge amounts of data. Testing these systems is recognized throughout the industry as a technical challenge of the first magnitude. SQL and similar database-system interpreters are highly complex. For example, they offer sophisticated optimization techniques and execution planning for queries input on the fly; opportunities for arcane design problems are ubiquitous. At the same time, the state space to be tested is gigantic. For a one-gigabyte database, the possible combinations of database configuration and SQL statement to be executed exceeds 10.sup.2,000,000,000.
Libraries of test scripts for relational database systems typically contain thousands or tens of thousands of sample statements which are applied to a test database for comparison of their results with known correct data. Existing libraries are known to be inadequate; most commercial database systems produce a constant and substantial stream of reported bugs. However, the amount of work required to generate larger libraries quickly becomes prohibitive. At the rate of a half hour per hand-written test statement, even a small library consumes more time than does the design of the system that it tests.
In the past, developers have employed some stochastic testing at the language level to accelerate database testing. For example, a test-case generator may choose a random mix of hand-generated fixed scripts. Choosing random parameter values in fixed scripts increases the effective number of test cases. These methods still require painstaking human composition and verification of long, multilevel queries. Automated generation of very simple queries considerably speeds up the generation of test cases, but eliminates the more complex test cases where subtle errors lurk.
In addition, conventional test systems are effectively limited to a fixed database, or to simple variations on fixed data. In order to construct statements that actually execute properly against the target database, the test system must be internally coded to produce only those statements that match the semantics of the database, the names of the database tables and their columns, the particular data types of each column, and so forth. However, testing on only one set of data obviously restricts the range of the tests that can be performed and thus the errors that will be uncovered. In addition, the use of fixed data for many tests does not permit slanting test runs toward certain kinds of applications, or focusing on the kinds of data or database structures that have been found to produce errors.
Also, the length and intricacy of test statements, although desirable for teasing out subtle errors, works against the isolation of bugs which cause those errors. Short, simple statements that produce errors are much more useful for tracking the errors down to particular parts of the DBMS under test.
The prior art is this field has not satisfied a longstanding need for very fast generation and manipulation of complex test statements for sophisticated database systems.