1. Field of the Invention
The present invention relates to a method, system, and program for selecting a join order for tables subject to a join operation.
2. Description of the Related Art
Data records in a relational database management system (RDBMS) in a computer are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. Tables in the database are searched using a Structured Query Language (SQL), which specifies search operations or predicates to perform on columns of tables in the database to qualify rows in the database tables that satisfy the search conditions. An SQL join operation involves a query that is performed on combinations of rows from a plurality of tables. Conceptually, when executing a join operation on a plurality of tables, the database engine forms all possible combinations of rows from the table. The database engine then applies the search condition to all the combinations of rows from the join tables. In a join query, a search condition or join condition specifies some relationship between the rows to be joined. One challenge in performing a join query on multiple tables is to select the order in which the tables will be joined when applying the search criteria, i.e., the order in which the rows of the table are joined.
Many database engines utilize optimization techniques to select the best possible join ordering for queries in relational database systems. The order in which the joins are performed has a substantial impact on query performance. The desired query execution plan consists of an ordered series of primitive database operators and is typically developed by choosing the plan having the least estimated execution cost from among several alternative plans making up a “search space”. A search space embraces a number of query execution plans that are limited according to the type and sequence of the primitive database operators allowed in the plans. Since only two-way join operators are usually provided as primitives, an optimizer is normally obliged to select the “best” sequence of two-way joins to achieve the N-way join of data tables requested by a non-procedural user query.
The computational complexity of the optimization process and the execution efficiency of the plan chosen by that process are dominated by the number of such possible primitive operator sequences that must be evaluated by the optimizer. That is, they are dominated by the size of the “search space” spanned by the query optimizer. An exhaustive enumeration is clearly exponential in the number of tables and hence is impractical for all but trivial queries. Thus, some scheme for limiting the search space must be provided to any query optimizer. The commonly assigned U.S. Pat. No. 5,301,317, entitled “System for Adapting Query Optimization Effort to Expected Execution Time,” which is incorporated herein by reference in its entirety includes further discussion of how the search space may be reduced to limit the cost of the query optimization process and dynamic programming techniques to optimize query evaluation plans.
Certain prior art query optimizers will switch to a heuristic approach if the number of tables in the join query exceeds a predetermined threshold. The reason for such switching is to limit the computational resources, e.g., memory, storage, processor cycles, expended on the query optimization process. To conserve computational resources, the query optimizer may use a greedy algorithm as the heuristic approach in the event when there are too many tables. One such prior art greedy approach is to use the smallest table, which is the table having the least number of rows, as the first table in the ordering and then determine the cost of different join orders with the smallest table first. The next smallest tables may be placed in subsequent positions in the join order when determining the cost of the different join orderings, such that the cost is considered with respect to different possible orders having some number of smallest tables in the initial order positions.
One common query evaluation plan is to use dynamic programming algorithms, which often are difficult to infeasible or extremely consuming to process if many tables, e.g., ten tables or more, are involved in the join operation. The article entitled “Optimization of Large Join Queries: Combining Heuristics and Combinatorial Techniques,” by Arun Swami, in the ACM SIGMOD Record Vol. 18, No. 2, pgs. 367-367 by the Association for Computing Machinery (ACM Copyright 1989), discusses problems with dynamic programming query evaluation techniques as the number of tables involved in the query exceeds ten. This article is incorporated herein by reference in its entirety.
Notwithstanding current query optimization techniques, there is a continued need in the art for improved query optimization techniques.