Relational databases store vast amounts of information in tables, whereby the stored information is retrieved by a user via a database application. In response to an input from the user, the database application submits queries to, for example, a database server. The database server accordingly accesses the tables specified in the query to determine which information within the tables satisfies the query. The information that satisfies the queries is then retrieved by the database server and transmitted to the database application and ultimately to the user. The response time experienced by the user depends, in large part, on the speed by which the queries are processed. Therefore, database designers are continually searching for techniques to reduce processing time of the queries.
One such technique is to utilize a logical table, as in a “materialized view” to answer queries. In general, views may be queried in the manner that tables are queried. Views present data that is extracted or derived from existing tables. A view is defined by metadata referred to as a view definition, which contains mappings to one or more columns in the one or more tables containing the data. Typically, the view definition is in the form of a database query. Columns and tables that are mapped to a view are denoted as base columns and base tables of the view, respectively. The data maintained in the base columns is considered the base data.
The data presented by views is gathered and derived on-the-fly from the base tables in response to queries that access the views. The data gathered for the view is not persistently stored after the query accessing the view has been processed. Because the data provided by conventional views is gathered from the base tables at the time the views are accessed, the data from the views will reflect the current state of the base tables. However, the overhead associated with gathering the data from the base tables for a view every time the view is accessed may be prohibitive in terms of time and computing resources.
A “materialized view” is a view for which a copy of the view data is stored separate from the base tables from which the data was originally gathered and derived. The data contained in a materialized view is referred to herein as materialized data. Materialized views eliminate the overhead associated with gathering and deriving the view data every time a query accesses the view. As new data is periodically added to the base tables, the materialized views are updated (i.e., refreshed) to reflect the new base data.
Computer database systems that are used for data warehousing frequently maintain materialized views that contain pre-computed summary information in order to speed up query processing. Such summary information is created by applying an aggregate function, such as SUM, COUNT, or AVERAGE, to values contained in the base tables. Materialized views that contain pre-computed summary information can thus be termed “summary tables.” Base tables from summary information is generated are referred to as fact tables.
As mentioned, relational databases store information in collections of tables, in which each table is organized into rows and columns. FIGS. 4A and 4B show an exemplary database relating to a retail operation. The database includes a store table 400, which has columns corresponding to the following attributes for stores: a Store Key field 401 for uniquely identifying the stores with a numeric value, a Store Name field 403 for specifying an alphanumeric label, a City field 405 for the name of the city in which the store resides, and a Strength field 407 to indicated the number of employees associated with the particular store. Each row 409-421 provides an entry for a respective store. For example, row 409 is an entry for a store with a Store Key of 1, a Store Name of “A1 Store”, City of Boston, and Strength of 50. A fact table 423, as shown in FIG. 4B, holds information in columns for the sales transactions of the stores; these columns, for example, can include a Store Key field 425 (which is identical to that of the Store Key field 401 of the store table 400), a Date field 427 to indicate the date of the sales, and a Sales field 429 to specify the total amount of sales for the store on the particular date. Rows 431-449 capture all the transactions for the corresponding stores.
FIG. 4C shows a materialized view 451 representing total sales by store name for Boston:                MV1: CREATE MATERIALIZED VIEW sales_boston                    ENABLE QUERY REWRITE            AS            SELECT s.store_name, SUM(f.dollar_sales) as sum_sales            FROM store s, fact f            WHERE s.store_key=f.store_key AND s.city = ‘Boston’            GROUP BY s.store_name;The MV1 451 includes the following columns: Store Name 453 and Sum Sales 455. Row 457 holds information for the A1 Store, which has total sales of $30,000. As seen in row 459, the A2 Store has sales of $15,000.                        
Although materialized views may exist in a database system, their use may not be fully utilized, resulting in sub-optimal performance. When a database user knows that a particular materialized view, such as MV1, contains the data desired by the user, the user can formulate a query that extracts the desired data directly from that materialized view. For instance, if the user can generate a query requesting information about the sales of each of the stores in Boston, the query can exploit MV1 to improve query processing. However, there are numerous circumstances in which a user will design a query that does not reference a materialized view to retrieve a set of data that can be most efficiently retrieved by accessing the materialized view. For example, the user may not be aware of all of the materialized views available in the database, or may not be sure how a particular materialized view could be used to retrieve the desired information.
Even when the user is aware that materialized views are available that have data desired by the user, the user may prefer to have the server transparently rewrite the query in terms of the materialized views, rather than writing the query to directly reference the materialized views. By allowing the server to rewrite the query, the database administrator may easily add and drop materialized views without affecting the application code that imbeds the user query. For example, if a user query is rewritten by the server to use a particular materialized view MV1, then the application will not break if the database administrator decides to drop MVI. In contrast, if the user query directly references MV1 in the application code, then dropping MV1 will break the application.
Consequently, some database servers include mechanisms for rewriting queries to take advantage of materialized views. The execution of the rewritten query is often considerably improved relative to the original query because the relational objects accessed by the rewritten query are (e.g., the materialized views) much smaller than the objects referenced in the original query (e.g., the base tables), and/or the number of objects used by the rewritten query is less than the number of objects referenced in the original query.
For example, a summary can correspond to a query which joins two tables and performs data aggregation to compute sum-of-sales by city, and stores the materialized result in a table. If a user issues a query Q which requests sum-of-sales by city, then Q can be transparently rewritten to access pre-computed data stored in the summary table. Because of the rewrite, the result for Q can be quickly produced by simply scanning the summary table instead of joining two tables and then aggregating the data.
Conventionally, a query cannot be rewritten if there are no materialized views that contain enough data to answer the original query. As a result, the performance of the query is significantly slower compared to use of a materialized view. The inability to take advantage of materialized views for satisfying a query is made clear by the following example. It is assumed that the database user seeks to obtain sales information of the stores in Boston and Seattle, in which the database application generates the following query:                Q1: SELECT s.store name, SUM(f.dollar_sales) as                    sum_sales            FROM store s, fact f            WHERE s.store key=f.store_key            AND s.city in (‘Boston’, ‘Seattle’)            GROUP BY s.store name;                        
The above query cannot rewrite against the MV1, because the MV1 does not contain sales information for ‘Seattle’. That is, the query cannot be rewritten because the materialized view restriction has to exactly match the query restriction; otherwise no rewrite is possible, as the data is not entirely contained in MV1, even though another materialized view exists for Seattle (i.e., MV2 of FIG. 4D). The MV2 461 has the columns of Store Name 463 and Sum Sales 465, and rows 467 and 469, corresponding to the stores, “A3 Store” and the “A4 Store.” Thus, conventionally, the query Q1 cannot be rewritten because the restrictions are not an exact match.
Given the overhead of maintaining materialized views, generating numerous materialized views to anticipate all possible combinations of data that can satisfy all possible queries is impractical. Therefore, there is a need for improving database system performance. There is also a need for enhancing query processing, whereby materialized views can be exploited. There is a further need to support query rewrite.