Relational databases store information in indexed tables. A user retrieves information from the tables by entering input that is converted to queries by a database application. The database application submits the queries to a database server. In response to a query, the database server accesses the table specified in the query to determine which information within the tables satisfies the queries. The information that satisfies the queries is then retrieved by the database server and transmitted to the data application, and ultimately to the user.
For any given database server, the queries must conform to the rules of a particular query language. Most query languages provide users with a variety of ways to specify information to be retrieved. For example, in the Structured Query Language (SQL), the following query requests the retrieval of the information contained in all rows of table T1:
______________________________________ QUERY 1! SELECT*FROM T1 ______________________________________
In executing a query, the database server receives the query as a string and converts the query string into a complex, internal structure, which is manipulated during the execution of the query. The process of converting a query string into an internal representation of a query is called parsing. Parsing involves checking that the query string conforms to the syntax of the query language and that the query string is semantically meaningful with respect to the objects (tables, columns, etc.) involved and their data types. During this process, an internal representation of the query is generated that includes the structure of the query, various internal structures used for bookkeeping during the different stages of parsing, and structures that hold information that is needed in later phases of the processing of the query. For instance, information about indexes on tables referenced in a query may be retrieved from the data dictionary of the database to be used for a later query transformation or optimization phase.
Thus, a database server may parse QUERY 1 into an internal structure represented as a parse tree in FIG. 1(a). Node 100 is the base of the parse tree and the head of the SELECT clause with node 102 as the argument. Node 110 is the base of the FROM clause with node 112 as the argument.
A query transformation is a modification of a query into another query that is structurally different but semantically equivalent in the sense of returning the same result. Relation databases systems often perform various types of query transformations with the goal of generating a transformed query that can be executed more efficiently than the original query. Such transformations may involve converting subqueries into joins, merging views so that a reference to a view is replaced with the definition of the view, etc. Some query transformations may involve adding additional table references or subqueries to a query, giving rise to the problem being addressed, namely, the generation of internal representation for the additional query constructs that were not part of the original query.
For example, U.S. patent application Ser. No. 08/808,621, entitled "Method for Executing Star Queries," filed by Hakan Jakobsson, Cetin Ozbutun, and William H. Waddington on Feb. 28, 1997, now U.S. Pat. No. 5,848,408 discloses "star transformation" techniques for transforming the internal structure of a star query.
For example, QUERY 2! is a star query:
______________________________________ QUERY 2! SELECT*FROM FACT,DIM1,DIM2 WHERE FACT.COL1 = DIM1.COL1 AND FACT.COL2 = DIM2.COL1 AND DIM1.COL2 = 5 AND DIM2.COL2 = 100 ______________________________________
The internal structure for QUERY 2 is represented in FIG. 1(b). Node 100 is the base of the start query, a SELECT clause, node 110 is the base of the FROM clause, and node 120 is the base of the WHERE clause. Each node in the internal structure represents information for the query. Thus, node 100 has one argument, node 102, stating which columns are to be selected. The FROM clause at node 110 has three arguments, nodes 114-118, stating upon which tables the query is executed. Finally, the predicate tree at AND node 122 is the argument to WHERE clause at node 120. There are four simple predicates in the predicate tree at nodes 130, 140, 150, and 160 corresponding to the predicates of the query. Each simple predicate is a tree containing a relation and arguments. For example, predicate at node 130 comprises an equality relation at node 130 and two arguments, nodes 132-134 and 136-138. Similarly, nodes 142-148, 152-156, and 162-166 are arguments for the respective relations.
Applying a star transformation technique to the internal structure of QUERY 2, represented in FIG. 1(b), yields a transformed internal structure represented in FIG. 2. Referring to FIG. 2, the transformed internal structure contains two additional clauses, starting at nodes 210 and 250 respectively in box 200, joined to AND node 122. The transformed internal structure corresponds to the following query:
______________________________________ QUERY 3! SELECT*FROM FACT,DIM1,DIM2 WHERE FACT.COL1 = DIM1.COL1 AND FACT.COL2 = DIM2.COL1 AND DIM1.COL2 = 5 AND DIM2.COL2 = 100 AND FACT.COL1 IN (SELECT COL1 FROM DIM1 WHERE COL2 = 5) AND FACT.COL2 IN (SELECT COL1 FROM DIM2 WHERE COL2 = 100) ______________________________________
The last two clauses serve to reduce the amount of processing the database server would have to perform by limiting the number of rows the database server has to retrieve from the fact table. Therefore, a star transformation requires the internal representation of the additional clauses to be appended to the original query.
In an environment where a relational database server is subject to further development by adding new features and capabilities, the internal representation of a query is also likely to change and grow. New features are likely to require new internal representations as well as associated structures for the additional bookkeeping during parsing and for processing queries where the new features may be involved. If new features are being added by different groups of program developers in a decentralized fashion, having multiple code paths for generating the internal representation of a query can result in code inconsistencies if not all code paths are properly updated to reflect the new features.
The risk of such inconsistencies is a very real one if a group of developers are adding a new feature while a different group is adding a new query transformation that generates a new internal representation. If the two groups are not sufficiently aware of the hidden interaction between their two projects, the new code paths may not correctly generate the new internal representation. Moreover, having multiple code paths results in additional code maintenance costs because every code path for generating the internal representation of a query must be updated for each new feature. Therefore, there is a need for reducing the number of code paths for the generating the internal representation of a query, by the parser, query transformation modules, and any other component of the database system.
Moreover, some transformations are not always beneficial, depending on the query being invoked. A particular transformation technique may improve performance for some queries but impair performance for other queries. In order to ascertain whether executing a query will benefit from a particular transformation, an estimate for the cost of executing the original query is made. Then the original query is transformed, and a cost estimate for the transformed query is computed. In this manner, transformations that do not improve performance can be detected and rejected. A query transformation is a destructive operation in that the structure of the query is changed, and a mechanism for undoing a rejected transformation is cumbersome, if not impossible. Therefore, there is a need to avoid having to provide an undo mechanism.