Database management systems (DBMS) utilize query optimizers to select an appropriate query execution plan for a given query. The query optimizer uses a cost model, and a key parameter of the cost model is the cardinality of sub-expressions of the query considered during optimization. Given an expression, the cardinality is the number of rows output of the expression when the expression is evaluated on the database. Obtaining accurate cardinality estimates can be crucial for finding a good query execution plan.
Current query optimizers trade accuracy of cardinality estimation (and hence quality of execution plan) for speed of optimization. The time used for query optimization is kept to a minimum by utilizing various approximation techniques (such as histograms) to quickly obtain cardinality estimates. While these approximation techniques ensure that the query optimization time is kept low, this speed comes at a cost of estimation errors. For example, it is well known that cardinality estimation errors can grow exponentially in the number of joins in a query. This can cause the query optimizer to pick an execution plan that is significantly less accurate than the query plan that uses accurate (or exact) cardinalities for the sub-expressions.
While the above trade-off between optimization time and cardinality accuracy (or execution plan quality) is necessary for specific or ad hoc queries, it is a fundamental challenge to understand to what extent the query execution plan quality is affected by errors in cardinality estimation. In order to understand this relationship, it is highly desirable to be able to obtain the “best” query execution plan without regard for query optimization time. The best execution plan is defined as the plan obtained by using exact cardinalities. Using exact cardinalities for sub-expressions of the query yields the maximum accuracy and thus gives the cardinality-optimal query execution plan, which is the optimal query execution plan based on cardinality.
Often it is desirable to estimate the cardinalities to obtain an execution plan as quickly as possible, but there are important scenarios where using exact cardinalities to obtain the best query execution plan can be extremely valuable. In these cases accuracy is preferred over speed. One scenario is query optimizer testing, where, for a given query, it can be important to know the execution plan given the assumption that the optimizer did not make any cardinality estimation errors. Such a plan serves as a benchmark against which other execution plans can be compared and can also help narrow down the cause of an execution plan quality problem. Thus, having access to the plan obtained using exact cardinalities can be very useful for query optimizer testing. Another scenario is repeated queries, where a particular important query can be repeatedly executed by the application. In such cases it is useful for a database application to obtain a plan that is as good as possible. The ability to produce an execution plan having no cardinality estimation errors is quite valuable in this context.
In the above scenarios accuracy is preferred over speed. However, it is still desirable to obtain the exact cardinalities as quickly as possible, so any technique that decreases the time necessary to obtain exact cardinalities is highly desirable. Despite its usefulness the problem of optimizing the time required to find the execution plan having exact cardinalities remains by and large unexplored. Of course, the natural approach of executing one query per relevant expression to obtain its cardinality can be prohibitively expensive since for many reasons. A main reason is that often there are a large number of relevant expressions for a query, and the time is takes to execute each of these queries can be significant.