A relational database is a collection of data items organized as a set of formally described tables from which data can be easily accessed. A relational database system facilitates access to a relational database by receiving queries from users, applications or other entities, executing such queries against the relational database to produce a results dataset, and returning the results dataset to the entity that submitted the query. Some relational database systems include a query optimizer that operates to generate an execution plan for a query. The execution plan represents an efficient execution strategy for the query. For example, the execution plan may represent a strategy for executing the query in a manner that conserves time and/or system resources.
An important part of an execution plan is its logical structure. The logical structure of an execution plan specifies, for example, the order in which tables are accessed and the order in which relational join and group by operations are performed. It can be difficult for a query optimizer to select an optimal execution strategy at least in part because generating a complete set of available execution plans may be computationally infeasible. As the number of tables referenced by a query grows, the number of possible logical orderings grows extremely fast. In particular, if n represents the number of tables referenced, the number of logical orderings grows with the Catalan Number f(n)=(2n)!/(n+1)! n!. For n=10, this is approximately 17 thousand different logical orderings. For n=20, this is over 6.5 billion different logical orderings. Given that each logical ordering may have hundreds or thousands of possible physical implementation alternatives, it may not be possible to explore and evaluate the entire plan search space for even modest values of n and return an execution plan in a satisfactory time.
Obtaining efficient execution plans for queries with large search spaces is one of the basic problems faced by commercial query optimizers. It requires striking a delicate balance among many aspects: quality of the execution plan produced and compilation time; robustness, extensibility and maintainability of the code. The problem has become more significant over the years because the complexity of queries keeps increasing.