The present invention relates to rewriting queries, and in particular, rewriting queries to access a materialized view.
In a database management system (DBMS), data is stored in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational database systems, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
The present invention is not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms xe2x80x9ctablexe2x80x9d, xe2x80x9crowxe2x80x9d and xe2x80x9ccolumnxe2x80x9d shall be used herein to refer respectively to the data container, record, and field.
For various reasons, it may not be desirable for certain users to have access to all of the columns of a table. For example, one column of an employee table may hold the salaries for the employees. Under these circumstances, it may be desirable to limit access to the salary column to management, and allow all employees to have access to the other columns. To address this situation, the employees may be restricted from directly accessing the table. Instead, they may be allowed to indirectly access the appropriate columns in the table through a xe2x80x9cviewxe2x80x9d.
A view is a logical table. As logical tables, views may be queried by users as if they were a table. However, views actually present data that is extracted or derived from existing tables. Thus, the problem described above may be solved by (1) creating a view that extracts data from all columns of the employee table except the salary column, and (2) allowing all employees to access the view.
A view is defined by metadata referred to as a view definition. The view definition contains mappings to one or more columns in the one or more tables containing the data. Typically, the view definition is in the form of a database query. Columns and tables that are mapped to a view are referred to herein as base columns and base tables of the view, respectively. The data maintained in the base columns is referred to herein as base data.
The data presented by conventional views is gathered and derived on-the-fly from the base tables in response to queries that access the views. That data gathered for the view is not persistently stored after the query accessing the view has been processed. Because the data provided by conventional views is gathered from the base tables at the time the views are accessed, the data from the views will reflect the current state of the base tables. However, the overhead associated with gathering the data from the base tables for a view every time the view is accessed may be prohibitive.
A materialized view, on the other hand, is a view for which a copy of the view data is stored separate from the base tables from which the data was originally gathered and derived. The data contained in a materialized view is referred to herein as (xe2x80x9cmaterialized dataxe2x80x9d). Materialized views eliminate the overhead associated with gathering and deriving the view data every time a query accesses the view. As new data is periodically added to the base tables, the materialized view needs to be updated (i.e., refreshed) to reflect the new base data.
Computer database systems that are used for data warehousing frequently maintain materialized views that contain pre-computed summary information in order to speed up query processing. Such summary information is created by applying an aggregate function, such as SUM, COUNT, or AVERAGE, to values contained in the base tables. Materialized views that contain pre-computed summary information are referred to herein as xe2x80x9csummary tablesxe2x80x9d or more simply, xe2x80x9csummariesxe2x80x9d. Base tables from summary information is generated are referred to as fact tables.
Summary tables typically store aggregated information, such as xe2x80x9csum of PRODUCT_SALES, by region, by month.xe2x80x9d Other examples of aggregated information include counts of tally totals, minimum values, maximum values, and average calculations. Summary tables are used to reduce the overhead associated with processing queries that request summary information. Because the summary information is already pre-computed, it need not be re-computed during the execution of each query that requests summary information. Instead, the pre-computed summary values can be directly retrieved from the summary columns of the summary table.
When a database user knows that a particular materialized view contains the data desired by the user, the user can formulate a query that extracts the desired data directly from that materialized view. However, there are numerous circumstances in which a user will design a query that does not reference a materialized view to retrieve a set of data that can be most efficiently retrieved by accessing the materialized view. For example, the user may not be aware of all of the materialized views available in the database, or may not be sure how a particular materialized view could be used to retrieve the desired information.
Even when the user is aware that materialized views are available that have data desired by the user, the user may prefer to have the server transparently rewrite the query in terms of the materialized views, rather than writing the query to directly reference the materialized views. By allowing the server to rewrite the query, the database administrator may easily add and drop materialized views without affecting the application code that imbeds the user query. For example, if a user query is rewritten by the server to use a particular materialized view MV1, then the application will not break if the database administrator decides to drop MV1. In contrast, if the user query directly references MV1 in the application code, then dropping MV1 will break the application.
In light of the foregoing, some database servers include mechanisms for rewriting queries that do not reference materialized views in a way that causes them to reference materialized views. The execution of the rewritten query is often considerably improved relative to the original query because the relational objects accessed by the rewritten query are (e.g. the materialized views) much smaller than the objects referenced in the original query (e.g. the base tables), and/or the number of objects used by the rewritten query is less than the number of objects referenced in the original query.
For example, a summary can correspond to a query which joins two tables and performs data aggregation to compute sum-of-sales by city, and stores the materialized result in a table. If a user issues a query Q which requests sum-of-sales by city, then Q can be transparently rewritten to access pre-computed data stored in the summary table. Because of the rewrite, the result for Q can be quickly produced by simply scanning the summary table instead of joining two tables and then aggregating the data.
One type of query that is rewritten is an aggregate queries that places a restriction on a hierarchical dimension. This type of query may be rewritten to use a materialized view that groups data along the same hierarchical dimension. A dimension is an attribute of a set of data, such as a column of a table. A hierarchical dimension is a dimension where there is a hierarchical relationship between the values of the dimension. A common example of a hierarchical dimension is time. Possible values for time include a particular day, month, quarter, or year. The values are hierarchically related because a particular day is associated with a particular month, which is associated with a particular quarter, which is associated with a particular year.
FIG. 1 depicts a dimension hierarchy 100, an illustrative dimension hierarchy for a time dimension. A dimension hierarchy is a hierarchy of values in an hierarchical dimension. Each node in a dimension hierarchy represents a value in a dimension. Node 102 represents 1998-1, a value representing the first quarter in 1998. Node 104, 106, 108 represents values xe2x80x981998-01xe2x80x99, xe2x80x981998-02xe2x80x99, and xe2x80x981998-03xe2x80x99, which each represent months. The following definitions are helpful to describing dimension hierarchies.
Each node in dimension hierarchy 100 represents a granule. A granule is a particular value in the hierarchical dimension.
Each granule resides at one of the levels in the hierarchy. Each level of the hierarchy has a xe2x80x9clevel of granularityxe2x80x9d. Levels that are higher in the hierarchy have coarser levels of granularity, while levels that are lower in the hierarchy have finer levels of granularity. For example, Quarter level 152, which is higher than month level 154 in the hierarchy, has a coarser level of granularity than month level 154.
Each level of granularity is associated with a set of granules. For example, quarter level of granularity 152 is associated with granules 1998-1, 1998-2, 1998-3, and 1998-4, which correspond to quarters. Similarly, month level 154 is associated with granules 1998-01, 1998-02, etc. that correspond to months.
The xe2x80x9cparent granulexe2x80x9d of a particular granule is the granule associated with the particular granule that is at the hierarchical level immediately above the hierarchical level of the particular granule. Thus, 1998-1 is the parent granule for granules 1998-01, 1998-02, and 1998-03.
A coarser granule, with respect to a particular granule, is a granule associated with a level of granularity that is coarser than the level of granularity of the particular granule. Thus, with respect to granule 1998-01, granules 1998 (represented by node 180) and 1998-1 are coarser granules.
A finer granule, with respect to a particular granule, is a granule associated with a level of granularity that is finer than the level of granularity of the particular granule.
Aggregate queries that restrict a particular dimension at a particular level of granularity may be rewritten by conventional rewrite mechanisms to access summary tables that group data from the fact table along the same dimension when the level of granularity at which the summary table groups the data is the finest level of granularity used by the fact table for that dimension. FIG. 2 illustrates a conventional rewrite of this type of query.
Referring to FIG. 2, aggregate query 210 requests summary information from sales table 250. Sales table 250 contains columns date 260, region 262, product 264, and $Amt 266. Date 260 contains values that represent days. Consequently, days are the finest level of granularity for the time dimension by which information in sales table 250 may be grouped. Query 210 requests summarized data from sales table 250 and restricts column date 260 to between Jan. 1, 1998 and Dec. 31, 1998. Thus, query 210 specifies a restriction on a time dimension.
FIG. 2 also shows the definition of a materialized view (MVSales 270). The illustrated definition defines a summary table that groups data from sales table 250 by date. Thus, MVSales 270 groups data along the same dimension (time) as the dimension specified in the restriction contained in query 210, and at the finest level of granularity of that dimension (date).
Because MVSales 270 groups data along the same dimension (time) as the dimension specified in the restriction contained in query 210, and at the finest level of granularity of that dimension (date), conventional rewrite mechanisms are able to rewrite the query 210 to access MVSales 270. Thus, query 210 is rewritten as query 280.
Unfortunately, conventional rewrite mechanisms do not rewrite aggregate queries to access summary tables when the summary tables group along the same dimension as is used in a restriction of the queries, but where the grouping in the summary table is not done at the finest level of granularity. Summary tables that aggregate data at the finest level of granularity are much bigger than summary tables that aggregate data at coarser levels, and therefore require more work to process when an aggregate query is rewritten to access them.
Based on the foregoing, it is clearly desirable to provide a query rewrite mechanism that is capable of rewriting queries to access materialized views that would otherwise not be rewritten by conventional rewrite mechanisms. In particular, it is desirable to provide a query rewrite mechanism that does not depend on summary tables that group data along the finest granularity of a hierarchical dimension.
Mechanisms described herein rewrite aggregate queries to access a materialized view when (1) the queries place a restriction on an ordered dimension, (2) the materialized view aggregates the information referenced in the query and groups by the same dimension, and (3) the materialized view groups the information at a coarser level of granularity than the granularity associated with the restriction contained in the aggregate queries.