Modern database management systems are primarily used as components of complex software systems involving multiple applications programs. The construction and maintenance of such systems is a daunting endeavor; system architects and administrators seek a detailed understanding not only of the various individual components in the system, but also of the relationships and interactions between the components.
From a system management point of view, the consistency and predictability of a system component (such as a database management system) can be important. When system components behave predictably, tuning and testing of the entire system is greatly simplified. The efficiency of an individual component, on the other hand, is of somewhat lesser importance when the component is considered as a part of a larger system. The scalability of the system is often more significant than the component-level scalability. When a particular component is not a performance bottleneck, local performance “optimizations” to individual components can actually be detrimental to the performance of the system as a whole when such optimizations detract form the predictability of the system, making it difficult to reason about performance at the system level and take appropriate tuning measures.
The task of the query optimizer is to select a low-cost query plan. The execution cost of a query plan depends on a large number of parameters, including the sizes of the relations being queried, the selectivity of query operators, the amount of memory available at query execution time, the number concurrently executing queries, the contents of the buffer cache, and the physical layout of selected records on disk. Because many of these factors are unknown at query compilation time, the standard approach to query optimization is as follows: first, generate rough guesses as to the values of the relevant parameters, using heuristic rules or extrapolating from any available statistics. Next, using the rough guesses as input, a search algorithm is invoked to find the least costly plan. The search phase typically treats the estimated parameter values as though they were completely precise and accurate values, rather than the coarse estimates that they actually are. This may lead less predictable behavior by the optimizer when it selects a query plan that promises a quick query execution time, but is in reality based on estimated selectivity values that are generated with relatively little information and therefore low confidence. The execution time penalty when the selectivity estimate is incorrect can be significant.