Many commercial relational database management systems (RDBMSs) employ cost-based query optimization exploiting dynamic programming (DP) to efficiently generate the optimal query execution plan. However, optimization time increases rapidly for queries joining more than 10 tables. Randomized or heuristic search algorithms reduce query optimization time for large join queries by considering fewer plans, sacrificing plan optimality. Though commercial systems executing query plans in parallel have existed for over a decade, the optimization of such plans still occurs serially.
The success of RDBMSs can largely be attributed to the standardization of the structured query language (SQL) query language and the development of sophisticated query optimizers that automatically determine the optimal way to execute a declarative SQL query by enumerating many alternative query execution plans (QEPs), estimating the cost of each, and choosing the least expensive plan to execute. Many commercial RDBMSs employ DP. DP builds QEPs “bottom up” and exploits the principle of optimality to prune sub-optimal plans at each iteration (thereby saving space) and to guarantee that the optimal QEP is found without evaluating redundant sub-plans.
As the number of tables referenced in a query increases, however, the number of alternative QEPs considered by a DP-based optimizer can, in the worst case, grow exponentially. This means that many real-world workloads that reference more than 20 tables would have prohibitive optimization times using current DP optimization. In extreme cases (queries referencing a large number of relatively small tables), the time to optimize a query with DP may even exceed the time to execute it. Although randomized or heuristic (e.g., greedy) search algorithms reduce the join enumeration time by not fully exploring the entire search space, this can result in sub-optimal plans that execute orders of magnitude slower than the best plan, more than negating any savings in optimization time by such heuristics. And while the plan picked by the optimizer can sometimes be stored and reused, thereby amortizing the optimization cost over multiple executions, changes to the parameters in the query or the underlying database's characteristics may make this approach sub-optimal, as well.