1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to the use of cube indices for optimizing queries in a relational database management system.
2. Description of Related Art
Computer systems incorporating Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Organization (ANSI) and the International Standards Organization (ISO).
RDBMS software typically has the capability of analyzing data based on .particular columns of a table. For example, rows can be grouped according to columns defined in a GROUP BY clause of a query. The column names in a SELECT clause are either a grouping column or a column function. Column functions return a result for each group defined by the GROUP BY clause.
A grouping query can include a standard WHERE clause that eliminates non-qualifying rows before the groups are formed and the column functions are computed. A HAVING clause eliminates non-qualifying rows after the groups are formed; it can contain one or more predicates connected by ANDs and ORs, wherein each predicate compares a property of the group (such as AVG(SALARY)) with either another property of the group or a constant.
The GROUPING SET operator extends the GROUP BY operation to simultaneously specify the computation of multiple GROUP BYs in a single GROUP BY operation. When the GROUPING SET operator is used, a NULL value in a non-null grouping column denotes that the particular column is collapsed in the aggregation. If a grouping column (c) is nullable, a GROUPING operator (GROUPING(c)) is required to distinguish between the NULL group and a column collapsed in the aggregation. Used in conjunction with grouping sets, the GROUPING operator returns a value which indicates whether or not a row returned in a GROUP BY answer set is a row generated by a grouping set that excludes the column represented by the expression. The argument can be of any type, but must be an item of a GROUP BY clause. The result of the function is set to one of the following values:
1xe2x80x94The value of expression in the returned row is a null value, and the row was generated by a super-group. That is, the argument is collapsed in the aggregation.
0xe2x80x94The value of the expression in the returned row represents a non-system generated value of the group (which may be null) and indicates that the argument is not collapsed in the aggregation.
ROLLUP and CUBE operations can also be specified in the GROUP BY clause of a query. ROLLUP and CUBE operations are shorthand for GROUPING SETs that represent common sets of GROUP BY operations that are required for common queries for online analytical processing (OLAP). ROLLUP grouping produces a result set containing the regular grouped rows and sub-total rows. CUBE grouping produces a result set containing the rows from ROLLUP and cross-tabulation rows. For example, ROLLUP can provide the sales by person by month with monthly sales totals and an overall total. In another example, CUBE can include additional rows for total sales by person.
For most RDBMS software, combinations of tables and views are used to access data stored in tables in the database. Indices are often used to improve the performance of retrieving data from tables. However, indices are generally limited to columns from base tables. Thus, indices are not seen as suitable for:
results of aggregations, and
results of joins for commonly used subsets of the data.
A view definition includes a query that, if processed, provides a temporary results table based on the results of the query at that point in time. Using an INSERT statement and an appropriately defined table in the database, the temporary results table can be stored in the database. To refresh this table, the user would need to perform a DELETE from the table and then perform the INSERT again.
Users can directly query against the created table, provided that the users are aware how the results were derived. Generally, the RDBMS software is not aware that such a table is any different from any other table in the database. Moreover, this table cannot be used by an optimizer within the RDBMS software to improve performance, even though the table may contain data that would drastically improve the performance of other queries.
This leads to the notion of summary tables or materialized views as envisioned by the present invention. These tables are similar to the created table described above, except that the definition of the table is based on a xe2x80x9cfull selectxe2x80x9d (much like a view) that is materialized in the table. The columns of the table are based on the elements of the select list of the full select.
In the present invention, with properly defined summary tables, the RDBMS software is now aware how the result in the summary table was derived. When an arbitrarily complex query is submitted, an optimizer in the RDBMS software can now consider using the summary tables to answer the query, which is a technique that requires performing subsumption tests between the query and summary table definition, and then performing compensation work once the optimizer decides that the summary table can be used for the answer.
There are extensive research activities and literature on this topic, as disclosed in the following publications, all of which are incorporated by reference herein:
1. L. S. Colby, R. L. Cole, E. Haslam, N. Jazaeri, G. Johnson, W. J. McKenna, L. Schumacher, D. Wilhite. Red Brick Vista: Aggregate Computation and Management. Proceedings of the 14th Int""l. Conference on Data Engineering, Orlando, Fla., 1998.
2. R. Bello, K. Dias, A. Downing, J. Feenan, J. Finnerty, W. Norcott, H. Sun, A. Witkowski, M. Ziauddin. Materialized Views In Oracle. Proceedings of the 24th VLDB Conference, N.Y., 1998.
3. D. Srivastava, S. Dar, H. Jagadish, A. Levy. Answering Queries with Aggregation Using Views. Proceedings of the 22nd VLDB Conference, Mumbai, India, 1996.
4. R. Cochrane and N. Mattos. Super Setsxe2x80x94Concatenation and Grouping Sets. ISO Document DBL LGW-34 X3H2-97-?, Jul. 23, 1997. This should also be in the currently published version of the SQL3 standard.
However, the current state of the art does not allow queries to be optimized using summary tables that are defined using grouping expressions containing GROUPING SETs, ROLLUPs, and CUBEs.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for optimizing database queries using subsumption tests between the query and at least one summary table that comprises a cube index to determine whether an expression in the query can be subsumed in the summary table. The summary table stores at least one materialized view involving at least one GROUP BY operation that computes at least one of the following: (1) a cube, (2) a rollup, (3) a grouping set; and (4) a concatenation of cubes, rollups, grouping sets, and one or more grouping items. When the expression in the query can be subsumed in the summary table, the query is rewritten to use the summary table.
It is an object of the present invention to make the RDBMS aware of how a result in a summary table was derived, so that an optimizer function of the RDBMS can use the summary tables to respond to queries. The techniques presented in the present invention involve complex and yet efficient subsumption tests among queries and are directly applicable to other areas such as multiple query optimization.