The present invention relates to electronic data processing, and more particularly concerns testing the validity of execution plans produced by database search engines for retrieving data from databases in response to user queries.
Many database systems receive user queries written in a non-procedural language such as SQL (structured query language) or QBE (query by example). This class of language allows users to formulate queries against a database in terms characteristics of the desired data, rather than specifying the operations required to extract the desired data from the database. Such database systems contain a search engine that converts the non-procedural query into an execution plan or tree having a sequence of detailed operations that will obtain the requested data. Execution plans for a given query are seldom unique. That is, there is usually a numberxe2x80x94frequently a very large numberxe2x80x94of different execution plans, each having different operations and/or different orders of operations, that all generate the same set of result data. Not all plans, however, are equally desirable. Some have a lowerxe2x80x94sometimes very much lowerxe2x80x94execution cost than others. Cost is normally expressed in arbitrary units representing computer time and resources required to carry out all the operations of an execution plan. Search engines of this type almost always contain an optimizer that attempts to produce a plan having a low estimated cost for obtaining the data. Although search engines and optimizers involve a high degree of expertise to design, many of them are available from a number of different sources.
There are situations where it is desirable to obtain information about execution plans in addition to the one chosen by the optimizer for execution. In a product development setting, for example, the ability to generate and test a large number of candidate plans for the same query is useful in designing, tuning, and checking the large number of components in a search engine, especially in its optimizer subsystem. Some of the purposes for testing multiple plans are:
Validating candidate plans. Do all candidates correctly compute the result of the query from which they were derived?
Stressing the execution engine. Is the engine actually capable of carrying out the plans considered by the optimizer, even those that are only seldom chosen as optimal?
Determining cost accuracy. Is the estimation of the costs of all plans accurate, compared to the actual costs of their execution?
Checking optimality of a selected plan. Is the plan that the optimizer actually selected in fact the one with the best execution values?
Finding cost distributions. What is the range of costs over many enumerated and uniformly sampled alternative plans for the same query? Such knowledge can be employed in the design of new optimization algorithms and techniques.
Validating, manipulating, and obtaining information concerning the performance of multiple execution plans for the same query can be valuable in other environments as well. In this context, the term xe2x80x9cvalidationxe2x80x9d is taken as a generic term to include other operations and manipulations involving multiple alternative plans for a database query, and also to include determinations of absolute or relative information concerning such plans.
Validating query plans is extremely valuable during development and testing of a query processor, and it useful even in a regular operating environment. However, the total number of possible alternative plans that can be developed to satisfy commonly encountered queries quickly becomes gigantic. Even the large, fast machines found in development laboratories cannot abide validating such huge numbers of alternative plans from a single query. For these reasons, checking alternative execution plans in database systems has been limited to selecting a relatively small number of plans more or less by hand, and running them through a validation process in the same manner that a single plan would be validated. Much more seriously, even the small number of tested plans in previous systems tend to be distributed in a non-random manner. That is, the plans selected for testing are clumped around certain strategies, and do not test a sample that is widely distributed among all the possible strategies and variations. The sample space is not uniform.
Even where a smaller number of possible alternative plans allows all of them to be validated, conventional methods provide no technique for listing these alternatives in any organized manner. If the alternatives cannot be organized in some way, a test program has no way to ensure that each of them is selected (and selected only once) for testing, and thus no way to guarantee that the test is exhaustive.
Database-system technology thus requires a way to validate execution plans from a single query in a manner that can be sufficiently random or exhaustive.
The present invention permits the validation of large numbers of alternative execution plans for a database query with a process that organizes the components of such plans efficiently. The invention allows sampling a random subset of the alternative plans, rather than a subset confined to a relatively small part of the space of all possible plans. Where time is available for a test of every plan, the invention can provide an exhaustive list of all possible alternative plans for a given query.
The invention achieves these and other advantages by building groups of operators representing alternative plans for a query and that have unique identifiers or ranks. Execution trees for alternative plans can then be quickly assembled by unranking them to assemble different operators from the groups. The execution trees are then tested, analyzed, or otherwise manipulated. If desired, alternatives can be specified for producing only certain plans, for covering a particular range of plans, or for other purposes.