The present invention relates to optimizing execution of queries on a database system, and in particular, transforming queries so that they can be executed more efficiently.
In relational database systems, a star schema is distinguished by the presence of one or more relatively large tables and several relatively smaller tables. Rather than duplicating the information contained in the smaller tables, the large tables contain references (foreign key values) to rows stored in the smaller tables. The larger tables within a star schema are referred to as xe2x80x9cfact tablesxe2x80x9d, while the smaller tables are referred to as xe2x80x9cdimension tablesxe2x80x9d. FIG. 1 illustrates an exemplary star schema with two dimensions.
Referring to FIG. 1, it illustrates a database system 100 that includes tables 102, 104 and 106. Table 102 is named xe2x80x9cstoresxe2x80x9d and contains information about each of the stores in which sales may occur. Each row in store table 102 contains a unique store-id and information about the particular store that corresponds to the store-id. Table 104 is named xe2x80x9cproductsxe2x80x9d and contains information about each type of product that may be sold in any of the stores. Each row in product table 104 contains a unique product-id and information about the particular product.
Table 106 is named xe2x80x9csalesxe2x80x9d and contains information about each sale in each of the stores represented in the store table 102. Each row in sale table 106 includes a unique sale-id, a store-id to indicate the store at which the sale was made, a product-id to indicate the product sold in the sale, and the date of the sale. Typically, the number of sales will be vastly greater than both the number of stores at which the sales are made and the number of products carried by the stores. Detailed information about the store and product involved in a sale transaction does not have to be stored in the rows of table 106 because such detailed information is available in tables 102 and 104, respectively. Instead, the rows of table 106 simply contain values (store-ids and product-ids) that reference information stored in the other tables 102 and 104. Therefore, tables 102, 104 and 106 constitute a star schema in which table 106 is the fact table and tables 102 and 104 are dimension tables.
Queries that operate on data stored in tables that belong to a star schema are referred to as star queries. The format of a star query is described below.
Star queries typically contain clauses which (1) identify the fact table and dimension tables, (2) specify the correspondence between the fact table columns and the dimension table columns, and (3) specify the constraints on the dimension table columns. For example, star queries in SQL typically have the form:
In Query 1, the fact table is named xe2x80x9cfactxe2x80x9d and the dimension tables are named xe2x80x9cdim1xe2x80x9d through xe2x80x9cdimnxe2x80x9d. The xe2x80x9cwherexe2x80x9d clause includes expressions that equate columns in the fact table with corresponding columns in the dimension tables. Such expressions are referred to herein as xe2x80x9cjoin predicatesxe2x80x9d. For example, the expression xe2x80x9cfact.key1=dim1.keyxe2x80x9d is a join predicate that specifies that the column xe2x80x9ckey1xe2x80x9d in the fact table corresponds to the column xe2x80x9ckeyxe2x80x9d in the dimension table that is named xe2x80x9cdim1xe2x80x9d.
Constraints are predicates referring to tables that restrict the query to a subset of the rows in the table referenced by the constraint. A table referenced by a constraint is referred to herein as a constrained table. The  less than dimension table constraints greater than are constraints referring to some or all of the dimension tables but not the fact table. For example, the query:
includes the dimension table constraints xe2x80x9cstores.city=San Josexe2x80x9d and xe2x80x9cproducts.cost greater than $1,000xe2x80x9d.
To execute a star query more efficiently, a star query may be rewritten. The process of rewriting queries is referred to herein as transformation. A method for transforming a star query is described in U.S. Pat. No. 5,848,408, issued to Hakan Jakobsson, et al., on Dec. 8, 1998.
Specifically, a star query (or the internal representation of a star query) is transformed by adding to the WHERE clause of the star query additional subqueries. These additional subqueries are constructed from the join predicates and dimension table constraints contained in the original star query. Specifically, for each constrained dimension k, an IN subquery may be generated of the form:
fact.keyk IN (select dimk.key from dimk where  less than dimk constraints greater than )
In the above expression, fact.keyk is the column of the fact table that a join predicate correlates with the xe2x80x9ckeyxe2x80x9d column of xe2x80x9cdimkxe2x80x9d. For example, consider the query:
Query 3 contains a dimension table constraint xe2x80x9cdim1.c3  greater than 4 and (dim1.c6 =10 or dim1.c6=11)xe2x80x9d for dim1. The query contains a join predicate xe2x80x9cfact.key1=dim1.keyxe2x80x9d which specifies a correlation between the xe2x80x9ckeyxe2x80x9d column of dim1 and the xe2x80x9ckey1xe2x80x9d column of the fact table. Based on these expressions, the following subquery would be generated:
fact.key1 in (select dim1.key from dim1
where dim1.c3 greater than 4 and (dim1.c6=10 or dim1.c6 =11))
Query 3 also specifies a dimension table constraint xe2x80x9cdim3.c5 between 100 and 1000xe2x80x9d for dim3. The query contains a join predicate xe2x80x9cfact.key3=dim3.keyxe2x80x9d which specifies a correlation between the xe2x80x9ckeyxe2x80x9d column of dim3 and the xe2x80x9ckey3xe2x80x9d column of the fact table. Based on these expressions, the following subquery would be generated:
fact.key3 in (select dim3.key from dim3
where dim3.c5 between 100 and 1000)
The subqueries generated in this manner are ANDed to the other predicates that constrain the dimension table dimk. In the present example, the resulting xe2x80x9ctransformedxe2x80x9d star query is:
Database systems may run on multi-processing systems. Multi-processing systems are typically partitioned into nodes, where each node may contain multiple processors executing multiple concurrent processes. To fully utilize the computing power of a multi-processing system, a database system may divide a large processing task required by a query into smaller work granules which may then be distributed to processes running on one or more processing nodes. Because the various work granules are being performed in parallel, the processing required by the query can be completed much faster than if the processing were performed on a single node by a single process.
To parallelize the execution of a star query, the work to be done on the fact table is dynamically partitioned by, for example, rowid range to create xe2x80x9cwork granulesxe2x80x9d. The work granules are than assigned to a set of processes. These processes, referred to herein as parallel query slaves, execute their assigned work granules in parallel with the other parallel query slaves.
Each parallel query slave accesses the part of the fact table specified by the rowid range for the query slave and executes the IN subqueries. When executing an IN subquery, a parallel query slave process may scan the entire dimension table. Thus, while the query is executed once for each portion of the fact table, each IN subquery is executed repeatedly for the entire portion of a dimension table.
Based on the foregoing, it is clearly desirable to provide a method that reduces the duplicated and wasted effort expended by each slave process recomputing an IN subquery for the entire portion of a dimension table.
A technique and mechanism for transforming a query is described. According to an aspect of the present invention, a given query is transformed into a transformed query that references a global temporary table. Specifically, the given query contains a join between a given table and one or more other tables referenced by constraints (e.g. predicates). References to one or more of the constrained tables are replaced by a reference to a global temporary table. Before executing the transformed query, data that satisfies the constraints placed on the constrained table is inserted into the global temporary table.