In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises hardware in the form of one or more central processing units (CPU) for processing instructions, memory for storing instructions and other data, and other supporting hardware necessary to transfer information, communicate with the external world, and so forth. From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
The overall speed at which a computer system performs day-to-day tasks (also called “throughput”) can be increased by making various improvements to the computer's hardware design, which in one way or another increase the average number of simple operations performed per unit of time. The overall speed of the system can also be increased by making algorithmic improvements to the system design, and particularly, to the design of software executing on the system. Unlike most hardware improvements, many algorithmic improvements to software increase the throughput not by increasing the average number of operations executed per unit time, but by reducing the total number of operations which must be executed to perform a given task.
Complex systems may be used to support a variety of applications, but one common use is the maintenance of large databases, from which information may be obtained. Large databases usually support some form of database query for obtaining information which is extracted from selected database fields and records. Such queries can consume significant system resources, particularly processor resources, and the speed at which queries are performed can have a substantial influence on the overall system throughput.
Conceptually, a database may be viewed as one or more tables of information, each table having a large number of entries (analogous to rows of a table), each entry having multiple respective data fields (analogous to columns of the table). The function of a database query is to find all rows, for which the data in the columns of the row matches some set of parameters defined by the query. A query may be as simple as matching a single column field to a specified value, but is often far more complex, involving multiple field values and logical conditions. A query may also involve multiple tables (referred to as a “join” query), in which the query finds all sets of N rows, one row from each respective one of N tables joined by the query, where the data from the columns of the N rows matches some set of query parameters.
Execution of a query involves retrieving and examining records in the database according to some search strategy. For any given logical query, not all search strategies are equal. Various factors may affect the choice of optimum search strategy. One of the factors affecting choice of optimum search strategy is the sequential order in which multiple conditions joined by a logical operator, such as AND or OR, are evaluated. The sequential order of evaluation is significant because the first evaluated condition is evaluated with respect to all the entries in a database table, but a later evaluated condition need only be evaluated with respect to some subset of records which were not eliminated from the determination earlier. Therefore, as a general rule, it is desirable to evaluate those conditions which are most selective (i.e., eliminate the largest number of records from further consideration) first, and to evaluate conditions which are less selective later.
Other factors can also affect the choice of optimum execution strategy. For example, certain auxiliary database structures may, if appropriately used, provide shortcuts for evaluating a query. One well known type of auxiliary database structure is an index. An index is conceptually a sorting of entries in a database table according to the value of one or more corresponding fields (columns). For example, if the database table contains entries about people, one of the fields may contain a birthdate, and a corresponding index contains a sorting of the records by birthdate. If a query requests the records of all persons born before a particular date, the sorted index is used to find the responsive entries, without the need to examine each and every entry to determine whether there is a match. A well-designed database typically contains a respective index for each field having an ordered value which is likely to be used in queries.
Another type of auxiliary database record used in some databases is a materialized query table. A materialized query table is conceptually a database subset data structure defined by a query, the subset containing all entries from a database table (or sets of entries from multiple database tables) which satisfy the defining query. Where a query includes the conditions of the query defining the materialized query table or logically requires a subset of the entries in the materialized query table, it may be advantageous to refer to the materialized query table in determining those entries which satisfy the query.
Some databases also employ partitioned tables, which can be used to advantage in evaluating certain queries. A partitioned table is simply a portion of a larger conceptual database table. For any of various reasons, the number of entries in the database table may be so large that it is deemed desirable to divide the table into multiple “partitioned” tables, each entry being allocated to a respective one of the partitioned tables. If the logical conditions of a query are such that it can be known that all entries satisfying the query will be contained in one of the partitioned tables, then it is not necessary to examine entries in the other partitioned tables, resulting in a considerable savings at query execution time.
To support database queries, large databases typically include a query engine which executes the queries according to some automatically selected search strategy, using the known characteristics of the database and other factors. Some large database applications further have query optimizers which construct search strategies, and save the query and its corresponding search strategy for reuse. These strategies may include, among other things, whether an auxiliary data structure such as an index or materialized query table will be used. Because it is anticipated that a search strategy constructed by a query optimizer may be re-used many times, the query optimizer may be justified in using very complex and sophisticated technique to determine an optimum query strategy. These techniques themselves may involve considerable overhead, much like compiling and optimizing an executable program involves overhead, but the overhead of optimizing a query can be outweighed by the improved efficiency of a query which is executed multiple times.
A re-usable query may include one or more imported variable values (“host variables”) in its logical conditions. I.e., where the query compares a field of each database record to some value, the query may be written so that the compared-to value is a variable, to be provided (“imported”) when the query is invoked for execution. In this manner, the same query can be re-used for multiple possible compared-to values, without the necessity of writing and maintaining separate queries for all possible compared-to values.
A query optimizer or similar function may generate and save a search strategy for a query based on certain assumptions about the use of auxiliary data structures or the number of entries eliminated from consideration by certain logical conditions. Since the range of entries eliminated by the indexed value will depend on the value of the variable to which it is compared, this range could vary considerably for different executions of the same query if the query contains imported variables. A query strategy which is optimized using one variable value will not necessarily be optimal using a different query strategy.
This problem can be particularly acute if a search strategy optimized for a particular variable value uses certain auxiliary data structures or partitioned tables. For example, if a search strategy uses a materialized query table or a partitioned table because the originally optimized query involves some subset of the table, a change in the variable value may cause the responsive subset to include values outside the materialized query table or partitioned table, which could result in significantly different execution performance or incorrect results.
It would be possible to simply re-optimize a query every time it is executed, but this involves considerable overhead. If intelligent determinations could be made to automatically re-evaluate the execution strategies of queries under certain conditions, particularly where the query involves imported variables which affects the range of records in a materialized query table, partitioned table, or similar subset of the database, the execution performance of such queries could be improved. A need therefore exists, not necessarily recognized, for an improved database query engine or optimizer which can automatically make intelligent choices in determining when to re-evaluate the execution strategies of certain queries.