1. Field of the Invention
The present invention relates generally to database query processing and optimization and more particularly to top-down rule-based database query optimizers.
2. Description of Background Art
A central issue in the design of database systems is the query processing strategy that is employed. Considerable focus has been placed in this area since a poor strategy can adversely effect the performance of the database system. In SQL, and similar query processing languages, a query can be expressed in a variety of different representations. Since the transfer of data that usually resides on secondary storage is slower than such a transfer from main memory, it is imperative that the number of accesses to secondary storage be minimized. Typically, a user writes a query without considering the most efficient manner for realizing the query. This task becomes the responsibility of a query optimizer.
The objective of the query optimizer is to find an execution strategy that causes the result of the query to be produced in the most efficient ("optimal") manner. Optimality is used to denote the best strategy that satisfies a prescribed criteria. Often this criteria is the minimization of a defined metric, such as computational cost. Query optimization is a search process that entails producing a solution space of semantically equivalent expressions that represent the query. The semantically equivalent expressions are generated through the application of rules. The optimizer searches through the solution space finding the optimal solution that best satisfies the defined metric.
A consideration in the design of a query optimizer is the minimization of its execution time as well as the conservation of memory space. The inefficient use of memory space and the execution of needless computations detrimentally affects the query optimizer's performance. Accordingly, there is a need to minimize the execution time of a query by utilizing efficient search procedures for finding the optimal solution.
Conventional query optimizers utilize a search engine and a database implementor (DBI) to generate an optimal plan for an input query having an optimization goal. The search engine generates a solution space from which an optimal solution or plan is selected. The solution space is defined by a set of rules and search heuristics provided by the DBI. The rules are used to generate solutions and the search heuristics guide the search engine to produce more promising solutions rather than all possible solutions.
The database query is represented as a query tree containing one or more expressions. An expression contains an operator having zero or more inputs (children) that are expressions. The query optimizer utilizes two types of expressions: logical expressions, each of which contain a logical operator; and physical expressions, each of which contain a physical operator specifying a particular implementation for a corresponding logical operator. An implementation rule transforms a logical expression into an equivalent physical expression and a transformation rule produces an equivalent logical expression. The database query is initially composed of logical expressions. Through the application of one or more implementation and transformation rules, the logical expressions in the database query are transformed into physical expressions.
The search engine utilizes a search procedure that generates a "solution" for the database query by partitioning the database query into one or more smaller subproblems where each smaller subproblem can contain one or more expressions. Some of the subproblems form a subtree including other subproblems as inputs. A solution to each subproblem is generated in accordance with an order that generates a solution for each input subproblem before a solution for its associated parent subproblem is generated. The solution for the database query is then obtained as the combination of the solutions for each of the subproblems.
The search procedure utilizes a top-down branch and bound technique for generating solutions for each subproblem. An initial solution is obtained for each subproblem that has an associated cost which is used as an upper bound for considering other candidate solutions. Additional solutions whose associated costs exceed the upper bound are eliminated from consideration. The solution having the lowest cost is selected as the optimal solution.
Solutions are generated through the application of implementation and transformation rules. Transformation rules produce equivalent logical expressions and implementation rules produce physical expressions. Each rule has a pattern and a substitute. A pattern is the before expression that is matched with the expression that is being optimized. A substitute represents the semantically equivalent expression that is generated by applying the rule. A rule's pattern matches an expression when the expression contains the same operators in the same position as the rule's pattern. Prior to applying a rule to an expression, all possible bindings that match a rule's pattern are determined. The purpose of a binding is to find all possible expressions that can match a rule's pattern in order to generate every possible equivalent expression.
A search data structure is used to store the expressions that are generated during the search process including those that are eliminated from consideration. The search data structure is organized into equivalence classes denoted as groups. Each group includes one or more logical and physical expressions that are semantically equivalent to one another. Initially each logical expression of the input query tree is represented as a separate group in the search data structure. As the optimizer applies rules to the expressions in the groups, additional equivalent expressions, and additional groups, are added. Duplicate expressions are detected before they are inserted into the search data structure.
The search procedure utilizes guidance methods that guide it toward generating more viable plans. The guidance methods produce guidance structures which are heuristics that are used to select rules that will generate more promising solutions. The heuristics capture knowledge of the search procedure which is passed onto later processing stages in order to eliminate generating unnecessary and duplicate expressions.
A problem with the conventional query optimizers, described above, is that when it is presented with a complex query, the conventional optimizers enumerate the plan search space, i.e., the set of all possible execution plans, by recursively applying transformation and implementation rules to existing plans or expressions. Considering all possible join orders, for example, results in an exponential growth in the number of applied rules as the number of tables to be searched increases. Consequently, such conventional query optimizers are unable to optimize such complex queries regardless of how efficient the implementation is. What is needed is a query optimization system and method that can optimize an arbitrarily complex query within a time that is at most linearly proportional to the complexity of the query.