Query optimization is important in relational database systems that deal with complex queries against large volumes of data. Unlike earlier navigational databases, a query on a relational database specifies what data is to be retrieved from the database but not how to retrieve it. Optimizing a query against a relational database is not as important in transaction-oriented databases where only a few rows are accessed either because the query is well specified by virtue of the application or because the query causes the data to be accessed using a highly selective index. In decision support and data mining applications, where the space of possible solutions is large and the penalty for selecting a bad query is high, optimizing a query to reduce overall resource utilization can provide orders of magnitude of overall performance improvement.
One existing query optimization technique is to rewrite the user-specified query. The query is transformed into a logically equivalent query that costs less, i.e. requires less time, to execute. The existing techniques for query transformation include syntactic and semantic techniques. Syntactic or algebraic transformations use the properties of the query operators and their mapping to rewrite the query. Some forms of magic set transformation, most forms of predicate push down, and transitive closures are techniques that fall under this category. Semantic query transformations use declarative structural constraints and the semantics of an application""s specific knowledge, declared as part of the database, to rewrite the query. Semantic query transformation based rewrites are called semantic query optimization or SQO.
The basic purpose of a query rewrite is to reduce the number of rows processed during the query. Existing techniques for query rewrite are focused primarily on structural constraints of the database and knowledge of the domain of the database. For example, structural constraint based semantic optimizations use functional dependencies, key dependencies, value constraints, and referential constraints that are defined on relations in the database. Other existing query optimizers use constraints called implication integrity constraints and subset integrity constrains.
In general, in one aspect, the invention features a method for optimizing queries to a database. The database includes a first table (T1) having a primary key (PK) column and a first correlated value column (CV1) and a second table (T2) having a foreign key (FK) column related to the primary key column of the first table and a second correlated value column (CV2). The method includes joining T1 to the T2 using PK=FK as the join condition to produce a join result having rows. Each row includes a value from CV1 and a value from CV2. The method further includes creating an initial running constraint (RC). The initial running constraint includes a null range. The method further includes producing a derived constraint rule (DCR) having the following form:
(PK=FK)xe2x86x92CV2+C1xe2x89xa6CV1xe2x89xa6CV2+C2
where C1 and C2 are constants, and xe2x80x9cxe2x86x92xe2x80x9d means xe2x80x9cimplies,xe2x80x9d by performing the following processing for each row in the join result: computing a new constraint (NEW), having a range, and modifying RC by merging the range of NEW with the range of RC.
Implementations of the invention may include one or more of the following. The join result produced by joining may include a virtual join result. Creating the initial RC may include applying the following equation:
CV2+C1xe2x89xa6CV1xe2x89xa6CV2+C2
C1=C2.
Computing a new constraint may include solving for C1 and C2 in the following equation:
CV2+C1xe2x89xa6CV1xe2x89xa6CV2+C2;
where CV1 and CV1 are from the same row in the join result. The range of RC may be specified by the following equation:
CV2+C1xe2x89xa6CV1xe2x89xa6CV2+C2
and modifying RC may include adjusting C1 and C2 so that the range of the modified RC covers the range of the unmodified RC and the range of NEW.
The method may further include determining the usefulness of the DCR. Determining the usefulness of the DCR may include comparing the range of the DCR to the range of one of CV1 or CV2. Determining the usefulness of the DCR may include computing usefulness using the following equation:   usefulness  =                    C        1            -              C        2              SIZE  
where SIZE is the range of one of CV1 or CV2. The method may further include discarding the DCR if its usefulness is greater than a threshold.
The method may further include maintaining the DCR in view of changes to T1 or T2. Maintaining may include doing nothing if a row is inserted in T1. Maintaining may include doing nothing if a column other than PK, FK, CV1, or CV2 is updated in T1 or T2. Maintaining may include doing nothing if a row is deleted from either T1 or T2. Maintaining may include recomputing the DCR after a predetermined number of rows have been deleted from T1 or T2. Maintaining may include joining a new row added to T2 with T1, finding the constraint associated with the new row, and merging the new constraint with the DCR to form a new DCR.
The method may include storing the DCR and applying the right-hand side of the DCR to a query if the left hand side of the DCR is present in a conjunction in the query and at least one of CV, or CV2 is referenced in the conjunction.
CV1 and CV2 may be date columns.
In general, in another aspect, the invention features a method for optimizing queries to a database. The database includes a first table (T1) having a primary key (PK) column and a first correlated value column (CV1) and a second table (T2) having a foreign key (FK) column related to the primary key column of the first table and a second correlated value column (CV2). The method includes joining T1 to the T2 using PK=FK as the join condition to produce a join result having rows. Each row includes a value from CV1 and a value from CV2. The method further includes creating an initial running constraint (RC). The initial running constraint includes a null range. The method includes producing a derived constraint rule (DCR) having the following form:                               (                      PK            =            FK                    )                →                  xe2x80x83                ⁢                              (                                                            CV                  2                                +                                  C                                      1                    ⁢                    A                                                              ≤                              CV                1                            ≤                                                CV                  2                                +                                  C                                      2                    ⁢                    A                                                                        )                    ⋃                                                  xe2x80x83                ⁢                              (                                                            CV                  2                                +                                  C                                      1                    ⁢                    B                                                              ≤                              CV                1                            ≤                                                CV                  2                                +                                  C                                      2                    ⁢                    B                                                                        )                    ⋃                                                  xe2x80x83                ⁢        …                                          xe2x80x83                ⁢                              (                                                            CV                  2                                +                                  C                                      1                    ⁢                    N                                                              ≤                              CV                1                            ≤                                                CV                  2                                +                                  C                                      2                    ⁢                    N                                                                        )                    ;                    
where C1A, C2A, C1B, C2B, . . . C1N and C2N are constants, xe2x80x9cxe2x86x92xe2x80x9d means xe2x80x9cimplies,xe2x80x9d and each parenthesized term on the right-hand side of the above equation represents an interval, by performing the following processing for each row in the join result: computing a new constraint (NEW), having a range, and modifying RC by forming a union of the range of NEW with the range of RC.
Implementations of the invention may include one or more of the following. The method may include merging two intervals if the number of intervals N exceeds a predetermined maximum number of intervals, K. Merging two intervals may include merging the two intervals that are closest to each other.
The method may include computing the usefulness of the DCR using the following equation:   usefulness  =                              C                      1            ⁢            A                          -                  C                      2            ⁢            A                              SIZE        +                            C                      1            ⁢            B                          -                  C                      2            ⁢            B                              SIZE        +    …    +                            C                      1            ⁢            N                          -                  C                      2            ⁢            N                              SIZE      
where SIZE is the range of one of CV1 or CV2.
In general, in another aspect, the invention features a computer program, stored on a tangible storage medium, for use in optimizing queries to a database. The database includes a first table (T1) having a primary key (PK) column and a first correlated value column (CV1) and a second table (T2) having a foreign key (FK) column related to the primary key column of the first table and a second correlated value column (CV2). The program includes executable instructions that cause a computer to join T1 to the T2 using PK=FK as the join condition to produce a join result having rows. Each row includes a value from CV1 and a value from CV2. The program further includes executable instructions that cause a computer to create an initial running constraint (RC), the initial running constraint comprising a null range. The program further includes executable instructions that cause a computer to produce a derived constraint rule (DCR) having the following form:
(PK=FK)xe2x86x92CV2+C1xe2x89xa6CV1xe2x89xa6CV2+C2
where C1 and C2 are constants, and xe2x80x9cxe2x86x92xe2x80x9d means xe2x80x9cimplies,xe2x80x9d by performing the following processing for each row in the join result: compute a new constraint (NEW), having a range, and modify RC by merging the range of NEW with the range of RC.
Other features and advantages will become apparent from the description and claims that follow.