The present invention relates to a method for choosing from among several submitted equivalent query expressions for execution in connection with a database or the like. More particularly, the present invention relates to such a method for choosing the most efficient query expression from among various alternatives derived from the equivalent query expressions, based on query optimization techniques.
In connection with a complex database system such as a relational database system or other database system, information is stored in multiple tables that typically cross-reference amongst themselves. Typically, a query language and query evaluator are employed to access and employ the information stored in such a database system. One example of such a query language and query evaluator are provided by MICROSOFT SQL SERVER, distributed by MICROSOFT Corporation of Redmond, Wash.
A query evaluator such as may be included in a database management system (DBMS) typically includes a query optimizer to optimize a received query as written in the query language. Note that a query as defined may contain multiple query sub-expressions (hereinafter xe2x80x98expressionsxe2x80x99) joined by various defined query commands. In turn, each query expression may contain multiple query expressions joined by query commands. Query optimization is generally known, and is performed according to known rules based on the database being queried and the structure thereof. Generally, a query optimizer generates multiple alternatives based on the received query, and then selects the most efficient of the alternatives for execution.
At times, a query expression within a query as submitted by a user or as written in an application can take on multiple alternate forms, where each form should produce the same result. Depending on the underlying layout of database tables, these equivalences cannot always be inferred by the optimizer. However, the user or the application does not know which form is most efficient, and therefore does not know which form to submit as the preferred query expression. Essentially, the user or the application cannot pre-submit each form of the query expression to the query optimizer and then choose the most efficient of the forms as optimized by such query optimizer.
Accordingly, a need exists for a method and mechanism that allows the user to in fact submit multiple alternate forms of a query expression in a manner so that the query optimizer selects the most efficient one of the forms of the query expression and optimizes same
The aforementioned need is satisfied by the present invention in which a query including therein at least one choice of a pair of equivalent expressions is optimized. The query comprises operators and arguments relating to the operators, and each choice is represented by a CHOOSE operator having a pair of CHOOSE arguments representing the corresponding pair of equivalent forms.
An initial tree representation of the query includes a node for each operator and each argument thereof. Each argument relating to an operator is represented in the tree as a child node of the node representing such operator. An initial version of a lookup table is generated to include a group for each node of the initial tree other than the nodes for each CHOOSE operator and the CHOOSE arguments thereof. The generated lookup table includes with regard to each CHOOSE operator node and the CHOOSE argument nodes thereof a single group having both CHOOSE argument nodes as members thereof. Each operator is represented as a member within a group thereof as the operator and a reference to a group of each argument associated therewith.
Transformation rules or the like may be applied to each member in each group of the initial version of the table to generate at least one of an additional member of a group, a new member of a new group, and combinations thereof, whereby for each group, every member thereof is an equivalent. A cost value is assigned to each member of each group and a member of each group is selected as a lowest cost value member of the group. The assigned cost value of each member reflects a cost of such member and if the member implies one or more other groups a cost of the lowest cost value member of each implied group. The lowest cost value members of the groups define an optimized tree for the query and thereby an optimization of the query.