1. Field of the Invention
The embodiments of the invention generally relate to database management, and more particularly to query techniques used for optimizing database searching.
2. Description of the Related Art
An outerjoin is a modification of an inner join that preserves all information from one or both of its relations. It can be further categorized into left(→), right(←) or full(⇄) outerjoins, depending on which side (of the relation) needs to be preserved. For example, the following SQL query will return all of the department names and employees within each department. For those departments without employees, the department names are listed with the employee name set to null.                SELECT department.dname, employee.ename        FROM department LEFT JOIN employee ON department.no=employee.dno        
Outerjoins are important because they are frequently used in the following traditional applications: (a) certain online analytical processing (OLAP) queries where it is necessary to preserve tuples from a fact table with unknown (or missing) dimensional values; (b) constructing hierarchical views that preserve objects with no children; and (c) queries generated by external tools and a query rewriting unit. The emergence of XML (eXtensible Markup Language) provides more applications for outerjoins. For example, in information integration, schema mapping involves the discovery of a query or a set of queries that transform the source data into a new structure. When mapping relational data to XML, the transformation queries rely heavily on outerjoins to avoid missing data. In another example, the construction of XML results in XQUERY and often needs the outerjoin semantic so that a parent element can survive without a matching child element.
An inner join query can always be canonically represented as a sequence of Cartesian products of all relations followed by a sequence of selection operations, each applying a conjunct in the join predicates. Such a canonical abstraction is very powerful because it enables an optimizer to use any join order for plan generation. Unfortunately, such a canonical abstraction for outerjoin queries has not been developed. As a result, existing techniques tend to prevent certain join orders from planning, which can lead to a severe performance penalty.
Optimizing queries with outerjoins is challenging because outerjoins are not always commutative with inner joins. The following are some rules on evaluation orders for joins and one-sided outerjoins, if all predicates are null-intolerant, which means that they cannot evaluate to true when referencing a null value:
                              R          ⁢                      ⟶            Prs                    ⁢                      (                          S              ⁢                              ⟶                Prt                            ⁢              T                        )                          =                              (                          R              ⁢                              ⟶                Prs                            ⁢              S                        )                    ⁢                      ⟶            Prt                    ⁢          T                                                              (                          R              ⁢                              ⟶                Prs                            ⁢              S                        )                    ⁢                      ⟶            Prt                    ⁢          T                =                              (                          R              ⁢                              ⟶                Prt                            ⁢              T                        )                    ⁢                      ⟶            Prs                    ⁢          S                                                                  ⁢                              R            ⁢                          ⟶              Prs                        ⁢                          (                              S                ⁢                                                      ⊳⊲                                    Prt                                ⁢                T                            )                                ≠                                    (                              R                ⁢                                  ⟶                  Prs                                ⁢                S                            )                        ⁢                                          ⊳⊲                            Pst                        ⁢            T                              
The problem of outerjoin reordering has been considered in the past, with most of the conventional approaches attempting to expand the search space for outerjoin queries in one way or another.
Some conventional approaches have identified a special class of query called a simple query, which has the property that the query graph (without specifying the join order) unambiguously determines the semantic of the query. Given a simple query, a conflicting set is computed for each join predicate by analyzing the query graph. The conflicting set of predicate P contains some other join predicates, which if applied after P, will yield incorrect results. Based on the conflicting set, two approaches in plan generation are applicable; one without compensation and one with compensation. The former only generates plans where all reordering is valid and thus does not need further compensation. The later allows plans containing invalid reordering (always generating more tuples) and later tries to compensate for the incorrect results through a generalized outerjoin. However, these approaches generally do not consider all join orders. Also, no efficient implementation has yet been given for a generalized outerjoin operation.
Other solutions, such as those described in U.S. Pat. No. 6,665,663 issued to Lindsay et al. on Dec. 16, 2003, the complete disclosure of which in its entirety is herein incorporated by reference, use an extended eligibility list (EEL) to represent the conflicts among join predicates. Although EEL is logically equivalent to the conflicting set representation, it is easier to be exploited by an optimizer. Compensation is performed through nullification and best match operations. Again, not all join orders are considered and no efficient implementation is given for the best match operation.
Still other solutions adapt the previous framework to deal with predicates with more than one conjunct. Generally, such a predicate is treated as a single predicate so that the conjuncts within it cannot be broken up.
Other conventional approaches also determine that null-intolerant predicates can simplify queries using outer joins. For example, if predicate Prs is null-intolerant, the following rule holds:
      R    ⁢                  ⊳⊲            Prs        ⁢          (              S        ⁢                  ←          Pst                ⁢        T            )        =      R    ⁢                  ⊳⊲            Prs        ⁢          (              S        ⁢                              ⊳⊲                    Pst                ⁢        T            )      
However, the conventional approaches generally do not achieve the goal of allowing all join orders. For example, conjuncts in the ON clause of an outerjoin are always treated as a whole and cannot be applied separately. As another example, when outerjoins exist, transitivity cannot be applied to generate redundant predicates.
To see the impact of this, two queries (Q1 and Q2) are illustrated in FIGS. 1(a) and 1(b). In Q1, the outerjoin uses two conjuncts as the join predicate. Supposing that both relations S and T are large, one would like to be able to apply one of the conjuncts, say R.b=S.b, to join relation R with S first (as shown in FIG. 1(c)). In Q2, two join predicates (from different ON clauses) share a common attribute. Supposing that S is rather large, one would desire the ability to generate an additional predicate R.a=T.a through transitivity, and to use it for joining relation R with T first (as shown in FIG. 1(d)). Unfortunately, neither FIG. 1(c) nor FIG. 1(d) gives the correct results. If it is assumed that R(k,a,c)={(1,1,1,1)}, S(k,a,b)={(2,1,1)}, and T(k,a,c)=φ, the answers for Q1 and Q2 are A1={(1,-,-)} and A2={(1,2,-)} respectively (where “-” represents a “null”). On the other hand, the query in FIG. 1(c) generates φ (changing the inner join to left outerjoin in FIG. 1(c) still yields the wrong answer A2) and that in FIG. 1(d) generates A1. Had the outerjoins been changed to inner joins, both transformations would have been valid.
While the conventional approaches were beneficial for the purposes they were designed to solve, there remains a need for improving the performance of outerjoin queries by optimizing the outerjoin queries to match the level to that of an inner join query.