1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to the automatic recognition and rerouting of queries for optimal performance.
2. Description of Related Art
Computer systems incorporating Relational DataBase Management System (RDBMS) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American Nationals Standard Organization (ANSI) and the International Standards Organization (ISO).
For most RDBMS software, combinations of tables and views are used to access data stored in tables in the database. Indices are often used to improve the performance of retrieving data from tables. However, indices are generally limited to columns from base tables. Thus, indices are not seen as suitable for:
results of aggregations, and
results of joins for commonly used subsets of the data.
A view definition includes a query that, if processed, provides a temporary results table based on the results of the query at that point in time. Using an INSERT statement and an appropriately defined table in the database, the temporary results table can be stored in the database. To refresh this table, the user would need to perform a DELETE from the table and then perform the INSERT again.
Users can directly query against the created table, provided that the users are aware how the results were derived. Generally, the RDBMS software is not aware that such a table is any different from any other table in the database. Moreover, this table cannot be used by an optimizer within the RDBMS software to improve performance, even though the table may contain data that would drastically improve the performance of other queries.
This leads to the notion of summary tables or materialized views as envisioned by the present invention. These tables are similar to the created table described above, except that the definition of the table is based on a xe2x80x9cfull selectxe2x80x9d (much like a view) that is materialized in the table. The columns of the table are based on the elements of the select list of the full select.
In the present invention, with properly defined summary tables, the RDBMS software is now aware how the result in the summary table was derived. When an arbitrarily complex query is submitted, an optimizer in the RDBMS software can now consider using the summary tables to answer the query, which is a technique that requires performing subsumption tests between the query and summary table definition, and then performing compensation work once the optimizer decides that the summary table can be used for the answer.
There are extensive research activities and literature on this topic, as disclosed in the following publications, all of which are incorporated by reference herein:
1. L. S. Colby, R. L. Cole, E. Haslam, N. Jazaeri, G. Johnson, W J. McKenna, L. Schumacher, D. Wilhite. Red Brick Vista: Aggregate Computation and Management. Proceedings of the 14th Int""l. Conference on Data Engineering, Orlando, Fla., 1998.
2. R. Bello, K. Dias, A. Downing, J. Feenan, J. Finnerty, W. Norcott, H. Sun, A. Witkowski, M. Ziauddin. Materialized Views In Oracle. Proceedings of the 24th VLDB Conference, New York, 1998.
3. D. Srivastava, S. Dar, H. Jagadish, A. Levy. Answering Queries with Aggregation Using Views. Proceedings of the 22nd VLDB Conference, Mumbai, India, 1996.
However, the current state of the art is that only simple SQL statements with a single query block can be optimized using summary tables, and only simple expressions can be handled.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for optimizing database queries using a derived summary table, wherein a definition of the summary table is based on a full select statement, including, but not limited to, a derived table involving nested GROUP BY operations and complex HAVING clauses with subqueries or joins, that is materialized in the table and describes how the summary table was derived. A query is analyzed using subsumption tests between the query and the definition of the summary table (that is, a query by itself) to determine whether expressions occurring anywhere in the query, but not in the summary table, can be derived using the content in the summary table, and hence the query is subsumed by the summary table definition An expression in the query is subsumed by and compensated by the summary table when the expression can be re-derived from one or more of the columns of the summary table.
It is an object of the present invention to optimize queries using summary tables storing materialized views. More specifically, it is an object of the present invention to make RDBMS software aware how a result in a summary table was derived, so that an optimizer function of the RDBMS software can use the summary tables to respond to queries. The techniques presented in the present invention involve complex and yet efficient subsumption tests among queries and are directly applicable to other areas such as multiple query optimization.