The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
In Database Management Systems (DBMSs) used for “data warehousing” or “decision support”, it is common for identical or closely related queries to be issued frequently. For example, a business may periodically generate reports that summarize the business facts stored in a DBMS, such as: “What have been the best selling brands of soft drinks in each of our sales regions, during the past six months?” To respond to such queries, the DBMS may have to perform numerous join, aggregation, and ranking operations. When a data warehouse contains very large amounts of data, certain queries processed by the DBMS can take an unacceptably long time to execute. Thus, among commercial users of data warehousing DBMSs, it has become a common practice to store the results of often-repeated queries in database containers or some other persistent database objects.
As used herein, “materialized view” refers to a database container that stores pre-computed data. In relational database management systems, data may be stored in a materialized view as records (also referred to as “rows”), where each record may include one or more data fields (also referred to as “columns”). In object-oriented databases, data may be stored in a materialized view as objects, where each object may include one or more data attributes. Other types of database architectures may use other terminology to indicate the structures in which data may be stored in a materialized view. For the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases; however, it is noted that the novel techniques for query rewriting described herein are not limited to any particular type of database architecture.
The data stored in a materialized view is typically defined by metadata referred to as a view definition. A view definition typically specifies one or more columns in one or more base tables that contain base data, which may be used to determine the data stored in the materialized view. For example, the following view definition defines the materialized view “MV1”,
CREATE MATERIALIZED VIEW  MV1  AS SELECT product, region, SUM(sales)  FROM t1  GROUP BY product, regionwhere the expression “SUM(sales)” specifies that the aggregated sum of sales stored in materialized view “MV1” is to be computed on a per-product, per-region basis from base data for all products and regions stored in base table “t1”.
Materialized views are often used to stored pre-computed aggregated data. As used herein, “aggregated data” refers to data that is derived by applying one or more aggregate functions to the values in a column of a group of rows in a base table. Examples of aggregate functions include, without limitation, SUM( ), COUNT( ), MIN( ), MAX( ), and AVERAGE( ). Aggregate functions may be specified in aggregate expressions that may be included queries and/or view definitions. As used herein, “aggregate expression” refers to an expression that specifies how the base data from the base tables may be used to compute aggregated data. (For example, the expression “SUM(sales)” in the view definition of materialized view “MV1” provided above is an aggregate expression.)
One approach to avoid frequent access to base tables in a data warehouse involves query rewrite. As used herein, “query rewrite” refers to the computer-implemented process of evaluating a query and generating a rewritten query that requests the same information but can be executed more efficiently. One of the ways to rewrite and improve a query is to rewrite the query to reference a materialized view instead of a base table. The query may be rewritten in this way when the materialized view contains pre-computed aggregated data requested or needed by the query, thus saving much of the cost of aggregating the data already stored in the materialized view. Typically, a query rewrite process would be transparent to the application submitting the original query. That is, a database server may perform the query rewrite process automatically without requiring the application to know about the existence of materialized views, or that a query that accesses a particular materialized view has been substituted for the original query that is written against one or more base tables.
A query rewrite process according to the above approach may use a matching mechanism to match queries with aggregate expressions to materialized views that store aggregated data. This mechanism, however, is capable of matching query aggregate expressions to aggregate expressions specified in view definitions of materialized views using only the associative and commutative properties of algebraic operators. Consequently, this mechanism lacks the ability to match complex aggregate expressions that may specify an arbitrary number of aggregate functions and operators in an arbitrary manner.
For example, by using the associative and commutative properties of algebraic operators, the above matching mechanism may be capable of performing the following aggregate expression matches when these aggregate expressions are present in queries and/or materialized view definitions used by the query rewrite process:
SUM(c1 + c2)toSUM(c2 + c1)SUM(2 * c1)toSUM(c1 * 2)SUM((c1 + c2) + c3)toSUM(c1 + (c2 + c3))However, since the above matching mechanism lacks the ability to use the distributive property of algebraic operators, the above matching mechanism is not capable of performing the following aggregate expression matches when these aggregate expressions are present in queries and/or materialized view definitions used by the query rewrite process:
SUM(c1 * (c2 + c3))toSUM(c1 * c2 + c1 * c3)SUM(2 * c1)toSUM(10 * c1)SUM(c1)toSUM(c1 + 100)SUM(c1 − c2)toSUM(−c2 + c1)SUM(c1), SUM(c2)toSUM(c1 + c2)(In the above aggregate expressions, “c1”, “c2”, “c3”, etc. refer to the names of columns in one or more database containers such as, for example, base tables and materialized views.)
Based on the foregoing, it is desirable to provide techniques for a query rewrite process, which techniques support matching of arbitrarily complex aggregate expressions and overcome the deficiencies of the above-described matching mechanism.