In large commercial database systems statistics are often gathered for the tables and other constructs of the database. These statistics are useful in identifying costs, which are used to generate an optimal plan for a given query. That is, large scale databases include query optimizers that determine a most efficient way to execute a given query by considering multiple alternative query plans and the cost of each individual query plan. The statistics are then vital to computing costs for a given request consisting of one or more queries.
So, query optimizers depend on compile-time information, such as statistics, cost parameters, predicate values, and resource availability for query optimization. A final plan for a request is referred to as a static plan and is chosen by computing a cost for each possible plan variation and then selecting the least expensive plan. During this process, the optimizer assumes that all the compile-time information is accurate and generates the plan for an entire request (a request can include multiple statements/queries). However, this assumption is not always true, particularly for complex queries.
Moreover, errors in cardinality and cost estimations for a static plan grow exponentially as the number of joins increases in the request even if there are good estimations for single-table cardinalities and selectivities. Skewed data and missing statistics make the errors grow even more, leading to sub-optimal plans, which either run out of spool or cause skewed processing. Further, even though the actual table row counts, Central Processing Unit (CPU) usage, and Input/Output (I/O) counts for intermediate steps may differ significantly from optimizer estimates; the optimizers blindly continue to execute the static plan.