In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
Modern computer systems may be used to support a variety of applications, but one common use is the maintenance of large relational databases, from which information may be obtained. Large relational databases usually support some form of database query for obtaining information which is extracted from selected database fields and records. Such queries can consume significant system resources, particularly processor resources, and the speed at which queries are performed can have a substantial influence on the overall system throughput.
Conceptually, a relational database may be viewed as one or more tables of information, each table having a large number of entries or records, also called “tuples” (analogous to rows of a table), each entry having multiple respective data fields (analogous to columns of the table) with a defined meaning. The function of a database query is to find all rows, for which the data in the columns of the row matches some set of parameters defined by the query. A query may be as simple as matching a single column field to a specified value, but is often far more complex, involving multiple field values and logical conditions. A query may also involve multiple tables (referred to as a “join” query), in which the query finds all sets of N rows, one row from each respective one of N tables joined by the query, where the data from the columns of the N rows matches some set of query parameters.
Execution of a query involves retrieving and examining records in the database according to some search strategy. For any given logical query, many different search strategies may be possible, all yielding the same logical result. But although all strategies yield the same logical result, not all search strategies are equal in terms of performance. Various factors may affect the choice of optimum search strategy and the time or resources required to execute the strategy. For example, query execution may be affected by the sequential order in which multiple conditions joined by a logical operator, such as AND or OR, are evaluated. The sequential order of evaluation is significant because the first evaluated condition is evaluated with respect to all the entries in a database table, but a later evaluated condition need only be evaluated with respect to some subset of records which were not eliminated from the determination earlier. Therefore, as a general rule, it is desirable to evaluate those conditions which are most selective first. Another factor may be the order in which records within a particular table are examined. Records in a table may be examined sequentially, sometimes known as a table scan, or may be examined according to an index value. Typically, a table scan examines more records, but an index scan requires, on the average, greater resource to examine each record. Query execution may be affected by any number of factors in addition to those described above.
To support database queries, large databases typically include a query engine which executes the queries according to some automatically selected search (execution) strategy, also known as a “plan”, using the known characteristics of the database and other factors. Some large database applications further have query optimizers which construct search strategies, and save the query and its corresponding search strategy for reuse.
An optimal strategy for executing a query will depend not only on the conditions of the query itself, but on various characteristics of the database. For example, where multiple tables are being joined in a single query, the relative sizes of those tables may affect the optimal query execution strategy, it often being desirable to evaluate conditions related to smaller tables first. Query optimizers and query engines may use any of various metadata structures, such as histograms constructed by sampling data in one or more database tables, to estimate the characteristics of the database records and project the effects of alternative query execution strategies on query execution performance.
When a query optimizer constructs a query execution strategy, it may perform sophisticated analysis of multiple alternative query execution strategies, attempting to find an optimal strategy for a particular query. The resources expended in performing this analysis may exceed, and in some cases may far exceed, the resources required to execute the query. Optimization is often justified because a query is expected to be reused multiple times, so that the overhead of constructing and optimizing a query execution strategy is distributed among multiple execution instances.
Sometimes, a database table undergoes rapid and frequent changes in its character. For example, the number of records in the table may fluctuate dramatically, or the values of particular fields may undergo frequent, widespread changes. When this happens, it is difficult or impossible to predict the character of the database table at a particular time, and specifically, at a time when a query might be executed. If a query execution strategy is constructed and optimized based on certain assumptions about the character of the table using data gathered at one time, these assumption may no longer be true at the time that strategy is executed, resulting in poor execution performance.
Because it is known that a query execution strategy is optimized according to certain assumed characteristics of the database, some database managers are configured to automatically re-optimize a query if a database undergoes significant changes. For example, a query can be re-optimized if it references a database table which changes in size by more than a pre-determined threshold. However, if a table is of a type which undergoes rapid and frequent changes, this capability to re-optimize queries can exacerbate the performance problems, since the optimizer may be frequently re-optimizing the query strategy to keep up with the changes to the table.
SQL (Structured Query Language) is a standard, widely used special purpose language for managing data in a relational database system. SQL permits a database designer or other user to specify, through use of a “VOLATILE” attribute, that a particular table in the database is expected to undergo rapid and frequent changes. Database management software can use the VOLATILE attribute, if specified, to alter the way it optimizes queries relating to the subject table. For example, it might optimize according to a generic optimization which makes few or no assumptions about the character of the subject table, it might disable re-optimization based on changes made to the subject table, and/or it might prefer an index access over other types of access such as a table scan or hash scan.
The SQL VOLATILE attribute provides a limited capability to improve database efficiency by optimizing a query differently if the query involves a volatile table. However, a more general and widespread capability to improve database management in various ways by taking into account table volatility has not been appreciated or exploited. Furthermore, many users are unaware of the VOLATILE attribute or do not understand its use. Additionally, because the attribute has only a binary state (on or off), various database management efficiencies which might hypothetically be possible with more complete volatility state information are not available.
Therefore, a need exists, not necessarily generally recognized, for improved techniques for managing relational databases which contain one or more volatile tables.