1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to a technique for factoring uncertainty into cost-based query optimization.
2. Description of Related Art
Computer systems incorporating Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
An important aspect of the RDBMS software is the optimization of the SQL queries. Typically, the RDBMS software will include a cost-based optimizer function that chooses among a plurality of possible access paths in order to select an optimal query execution plan. Choosing sub-optimal query execution plans can be detrimental for query performance.
Cost-based query optimizers must sometimes make assumptions about the data being queried. The assumptions typically are:                Uniform data distribution, and        Independence amongst predicate conditions.        
However, these assumptions will not always be true. Often, the data will not be uniformly distributed and there will be correlation amongst the specified predicate conditions. The assumptions, therefore, lead to uncertainty that the cost estimate computed by the optimizer is accurate. Naturally, if the cost estimate is not accurate, the selected access path may perform poorly.
Typically, cost-based query optimizers attempt to deal with this uncertainty by eliminating it whenever possible. This is usually accomplished through the collection of detailed statistics, and perhaps through the use of statistical views, statistics advisors, feedback mechanisms, or other means.
When the query involves predicates with host-variables, i.e., data items declared in an SQL statement with their values determined at runtime, collecting additional statistics may do little to reduce the uncertainty. In these cases, traditional cost-based query optimizers may defer the optimization until execution time, when the host-variable values are known, or use techniques to feed back information from prior executions of the query (i.e., the “learning optimizer” technique) in order to improve the accuracy of the cost-estimates made by the query optimizer over many executions of the query.
While eliminating, or minimizing, the degree of uncertainty involved in estimating the cost of performing a particular query is desirable, it cannot always be achieved, and often the cost associated with identifying and collecting the statistics needed to significantly reduce uncertainty, or the cost of learning the execution properties of the query to the extent needed to more accurately optimize the query, can be extremely high.
Thus, there is a need in the art for improved optimization techniques that ensure the selection of optimal (or near optimal) access paths for queries using cost-based optimization. Specifically, there is a need in the art for solutions to problems directed to the selection of optimal plans using a technique for factoring uncertainty into cost-based query optimization.