Computer systems incorporating Data-Base Management System (DBMS) software using a Structured Query Language (SQL) interface are well known in the art. In a DBMS system, queries typically specify what data is to be accessed, rather than how that data is to be accessed. An SQL Query compiler, and specifically an optimizer function of the SQL Query compiler, automatically determines the appropriate way to access and process the data referenced in a single query or SQL statement respectively. In any case, the access path or Query Execution Plan (QEP) respectively has to be determined before an SQL statement can be executed. The determination of access paths is also known as statement preparation or query optimization.
The optimizer function of the SQL query compiler aims at optimal access paths with minimal execution time. There are approaches, well known in the art, that exploit different statistics to optimize an access path such as object statistics (e.g., table size, cardinality of table columns, value distribution of table columns, etc.) and system statistics (e.g., processor speed, disk I/O latency, buffer pool characteristics for buffer pools that support relevant tables, etc.).
SQL statements are prepared either statically or dynamically. To enhance performance, database systems usually cache dynamically-prepared SQL statements in the so called dynamic statement cache (DSC), which resides in main memory. Thus, these cache dynamically-prepared SQL statements can be executed repeatedly using the same access path.
However, access paths determined by the optimizer function of an SQL Query compiler are not necessarily optimal, because the optimizer function might base its decisions on information that is outdated. Also, a QEP that was optimal when generated may deteriorate due to the growth or shrinkage of accessed tables. Once an access path has been established, it is difficult to automatically assess an optimal point at which an old QEP is replaced by a new QEP. If new QEPs are generated too often, computing resources are wasted during the superfluous QEP generations. Likewise, if suboptimal QEPs are not replaced with better QEPs, computing resources are wasted during the executions of these suboptimal QEPs.
Weikum, et al., describe in “Self-tuning Database Technology and Information Service: from Wishful Thinking to Viable Engineering”, Proceedings of the 28th VLDB Conference, Hong Kong, 2002, an approach that exploits database performance statistics to adjust global DBMS parameters. These adjustments affect the behavior of the entire DBMS, potentially resulting in deterioration of well-performing SQL statements. Further, these adjustments do not enhance the statistical information of the optimizer function.
Another approach as described by Ng, et al., in “Dynamic Query Re-Optimization”, the 11th International Conference on Scientific and Statistical Database Management, Cleveland, Ohio, 1999, provides a general framework for triggering re-optimization. However, this approach neither enhances the statistical information nor considers the repeatable execution of the same SQL statement. Rather, this approach aims at the dynamic re-optimization of parts of a query but does not save and exploit this knowledge for the following query optimization runs.
U.S. 2002/0198867 A1 by Lohman, et. al., titled “Learning from Empirical Results in Query Optimization,” describes an optimizer function of a DBMS that generates alternative QEPs for executing a query. For each of the alternative QEPs an execution model is provided. One of the alternative QEPs is chosen for execution based on the model associated therewith. The optimizer function uses, for example, the base table cardinality, as stored in the statistics, in its cardinality estimation model to compute an estimate for the result cardinality of each table access operator after application of the predicate. Besides, the comparable actual cardinality is measured for each operator during execution.
The actual and estimated cardinalities are compared to obtain a feedback to the statistics that were used for obtaining the base table cardinalities as well as a feedback to the cardinality model that was used for computing the estimates. Thus, the optimizer function exploits an empirical measurement from the execution of the chosen QEP to validate the model associated therewith. Thereby, the optimizer function determines whether the model is in error and in that case one or more adjustments to the model are computed to correct the determined error.
The estimates of the optimizer function concerning table characteristics are validated. Therefore, actually measured table characteristics such as value distributions in table columns or the number of rows are compared with estimates for these characteristics which were used when choosing a QEP for execution. Although this technology has proven to be useful, it would be desirable to present additional improvements.
What is therefore needed is a system, a computer program product, and an associated method for an optimizer function that learns from previous executions of an SQL query to exploit and enhance system statistics. The need for such a solution has heretofore remained unsatisfied.