Many large database systems use query-planning tools, known as optimizers, to plan effective use of database resources (e.g., CPU capacity, I/O systems, network connections, and storage facilities) and improve efficiency in query execution. The data from which query-answer sets are formed may be spread among many computer systems or processes, and the database system must organize the operation of these resources in retrieving the data. In general, the larger the database or the more complex the query, the greater is the need for a sophisticated query optimizer. Poor query optimization leads to poor system performance and, in many cases, complete failure to return an answer set altogether.
The query-optimizing tools used today rely on statistics collected against the underlying database in order to estimate the lowest cost plan for query execution. There are two major categories of database optimizers: (1) rule-based and (2) cost-based optimizers. A rule-based optimizer uses a set of well-defined rules in determining the execution path of a query, based on the statistics available for query-planning. A cost-based optimizer assesses the expected cost of query execution—in terms of the amount of each resource that must be devoted to the query-execution task—across a variety of potential execution plans. The cost-based optimizer then selects the plan that appears to be the lowest cost plan available.
In preparing a query-execution plan, a cost-based optimizer typically constructs some form of “decision tree” to enumerate the various possibilities for query execution. The optimizer then “optimizes” the execution plan by assessing how the database system would be expected to perform along each decision path in the tree and choosing the paths with the lowest expected costs. Because statistics do not always reflect reality, however, relying on statistics often leads to decision-making errors that produce ill-optimized and inefficient query-execution plans. Although rule-based optimizers are more simplistic than cost-based optimizers, inaccurate statistics can also lead to poorly optimized, and thus inefficient, query-execution plans because of incorrect inputs to the rule-based system for query optimization.