The present invention relates to DBMSs, and in particular, to maintaining materialized views.
In a database management system (DBMS), data is stored in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational DBMSs, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
The present invention is not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms xe2x80x9ctablexe2x80x9d, xe2x80x9crowxe2x80x9d and xe2x80x9ccolumnxe2x80x9d shall be used herein to refer respectively to the data container, record, and field.
In a DBMS used for xe2x80x9cdata warehousingxe2x80x9d or xe2x80x9cdecision supportxe2x80x9d, it is common for identical or closely related queries to be issued frequently. For example, a business may periodically generate reports that summarize the business facts stored in the database, such as: xe2x80x9cWhat have been the best selling brands of soft drinks in each of our sales regions, during the past six months?xe2x80x9d.
To respond to such queries, the database server typically has to perform numerous join operations because the database records contain the information that is required to respond to the queries. When a DBMS contains very large amounts of data, certain queries against the database can take an unacceptably long time to execute. The cost of executing a query may be particularly significant when the query (which takes the form of a xe2x80x9cSELECTxe2x80x9d statement in the SQL database language) requires join operations among a large number of database tables.
Among commercial users of DBMSs, it has become a common practice to store the results of often-repeated queries in database tables or some other persistent database object. By storing the results of queries, the costly join operations required to generate the results do not have to be performed every time the queries are issued. Rather, the database server responds to the queries by simply retrieving the pre-computed data.
These stored results are commonly referred to as materialized views. The contents of a materialized view is defined by metadata referred to as a view definition. The view definition contains mappings to one or more columns in the one or more tables containing the data. Columns and tables that are mapped to a materialized view are referred to herein as base columns and base tables of the materialized view, respectively. The column and the base column mapped to the column are referred to as being the same field. The data maintained in the base columns is referred to herein as base data. The data contained in a materialized view is referred to herein as materialized data.
Typically, the view definition is in the form of a database query, herein referred to as a materialized view query. The materialized view query is computed and the results are stored as the materialized view. The results can be in the form of rows, which may be rows from a single base table or a rows created by joining rows in the base table. When a xe2x80x9cbasexe2x80x9d row (or a subset of its columns) from a base table is included in the computed results of a materialized view query that are stored, the row is referred to as being included in the materialized view. The base table whose rows are included in the materialized view are referred to as the xe2x80x9cprojectedxe2x80x9d table. Likewise, when a xe2x80x9cbasexe2x80x9d row (or a subset of its columns) from a base table is included in the computed results of a query, the row is said to be returned by the query. Furthermore, the columns that are included in the materialized view or the computed results in the query are referred to as being projected by the materialized view or query.
Materialized views eliminate the overhead associated with gathering and deriving the data every time a query is executed. Through a process known as query rewrite, a query can be optimized to recognize and use existing materialized views that could answer the query. Typically, the query rewrite optimization is transparent to the application submitting the query. That is, the rewrite operation happens automatically and does not require the application to know about the existence of materialized views, nor that a particular materialized view has been substituted for the original query.
As new data is periodically added to the base tables of a materialized view, the materialized view needs to be updated to reflect the new base data. When a materialized view accurately reflects all of the data currently in its base tables, the materialized view is considered to be xe2x80x9cfreshxe2x80x9d. Otherwise, the materialized view is considered to be xe2x80x9cstalexe2x80x9d. A stale materialized view may be recomputed by various techniques that are collectively referred to as a xe2x80x9crefreshxe2x80x9d. A refresh may performed for a materialized view while modifications are concurrently being made to its base tables.
One approach to refreshing materialized views is referred to as the xe2x80x9ctotal refreshxe2x80x9d or xe2x80x9cfull refreshxe2x80x9d approach. According to the total refresh approach, the values in materialized views are recalculated based on all of the base data every time new base data is supplied. Systems that employ the full refresh approach have the disadvantage that the re-creation process is a relatively lengthy operation due to the size and number of tables from which the materialized data is derived. For example, when ten new rows are added to a particular base table that contains a million rows, a total refresh operation would have to process all one million and ten rows of the base table to regenerate the materialized views derived using the base table.
The process of updating materialized data may be improved by performing incremental refresh, where rather than generating a new set of materialized data based on calculations that use all of the base data, the materialized data is updated based on modifications to the base data.
One type of conventional approach for incremental refresh, the concurrent refresh approach, refreshes the materialized view whenever the base tables of the materialized views are modified. A disadvantage for this approach is that it adds overhead to operations that modify the base tables. Furthermore, the overhead is incurred during periods of greatest activity for the base tables.
There are several reasons the concurrent refresh approach requires that the materialized view be refreshed whenever the base tables are changed. First, the concurrent refresh approach uses an algorithm that requires that the old values of the base table be available. Generally, these values are available if the materialized view is refreshed whenever the base table is modified. Second, the algorithm used may also require that the materialized view be refreshed based on the chronological order of changes. Refreshing the materialized view as changes are made to the base tables ensures that the materialized view is refreshed according to the chronological order of changes.
Other conventional incremental refresh approaches that can defer refresh of a materialized view may also depend on the availability of the old values of the base table and knowledge of the chronological order in which changes are made to a base table. Retaining this information can be very costly for a DBMS, or such information may simply not be available when the refresh is performed.
Furthermore, many types of materialized views cannot be incrementally refreshed by the conventional incremental refresh approaches. For example, there is no mechanism that incrementally refreshes a materialized view that defines a join operation that is based on a many-to-many relationship between the tables being joined, or that is based on a union of queries. To illustrate a materialized view that cannot be incrementally refreshed using conventional incremental refresh approaches, the following table definitions and materialized view superstore are provided.
create table bookstore (
storenum number primary key,
name varchar2(1000),
location varchar2(1000));
create table school (
schoolnum number primary key,
population number);
create table supplies (
storenum number,
schoolnum number,
primary key (storenum, schoolnum));
create table sale (
saleid number primary key,
storenum number,
amount number,
itemid number,
saledate date);
The base tables are used by an application to model the sales between bookstores and schools. The application maintains the table supplies so that two of its columns hold primary key values for tables school and bookstore. Supplies.storenum contains primary key values from bookstore.storenum; supplies.schoolnum contains primary key values from school.schoolnum.
The columns of tables and the values in the column may be used to establish a relationship between tables and between the rows in the tables. A row in one table and a row in another table are related if one or more of their columns satisfy a xe2x80x9crelationalxe2x80x9d condition, typically based on equality. For example, a row in bookstore is related to a row in supplies if, for these rows, the columns bookstore.storenum and supplies.storenum are both equal to one.
A table has a one-to-one relationship with another table when one row in the table can only be related to one row in the other table. A table has a one-to-many relationship with another table when a row in the table can be related to multiple rows in the other table. A table has a many-to-many relationship with another table when multiple rows in the table may be related to multiple rows in the other table.
A third table may be used to establish a many-to-many relationship between tables. The relational conditions defining the relationship are based on columns in the three tables. Column values in a row of the third table establish relationships between rows of one table and rows of another table. To illustrate a many-to-many relationship, FIG. 1 is provided. FIG. 1 shows table-of-values 101, which depict illustrative values for tables bookstore, supplies, and school. Values for all columns are not shown.
Row 121 is related to multiple rows in school, i.e. row 141 and 142, for the following reason. Row 121 of bookstore is related to row 131 and 132 in supplies because for rows 131 and 132, the value in supplies.storenum is equal to xe2x80x981xe2x80x99, which is the value of bookstore.storenum of row 121. Row 131 is related to row 141 of school.schoolnum because for row 131 the value in supplies.schoolnum is equal to xe2x80x981xe2x80x99, which is equal to the value in school.schoolnum of row 141. Row 132 is related to row 142 of school.schoolnum because for row 132 the value in supplies.schoolnum is equal to xe2x80x982xe2x80x99, which is equal to the value in school.schoolnum of row 142. Because rows 131 and 132 are related to row 121 in bookstore and rows 141 and 142, rows 131 and 132 establish a relationship between a row in bookstore and multiple rows in school.
Likewise, row 141 in school is related to rows 131 and 133 in supplies. Row 131 is related to row 121 in bookstore and row 133 is related to row 122 in bookstore. Because rows 131 and 133 are related to row 141 in school and to rows 121 and 122 in bookstore, rows 131 and 133 establish a relationship between a row in school and multiple rows in bookstore. Thus a many-to-many relationship exists between bookstore and school because multiple rows in bookstore may be related to multiple rows in school and vice versa. Thus supplies establishes a many-to-many relationship between bookstore and school.
Materialized view superstore is defined by database command QCM as follows.
QCM specifies a union between two queries, referenced here as QCM1 and QCM2, both of which return rows from bookstore to include in superstore. Query QCM1 specifies a join operation between table bookstore, supplies, and school based on the following join conditions: school.schoolnum=supplies.schoolnum and supplies.storenum=bookstore.storenum. These join conditions are the same as relational conditions upon which the many-to-many relationship discussed above is based.
Conventional incremental refresh mechanisms are capable of incrementally refreshing materialized views whose materialized view query is based on a many-to-one relationship, where a projected table is joined to a base table along a many-to-one relationship. Conventional incremental refresh mechanisms cannot, however, incrementally refresh materialized views whose materialized view query specifies a join between a projected table and another table along a one-to-many or a many-to-many relationship. One reason for this lack of support by conventional incremental refresh mechanisms is that whether a row from a projected base table is included in a materialized view may depend on multiple rows from another base table. Thus, when a change to row in the other base table affects a row from the projected table in the materialized view, it cannot be assumed that no other rows in the base table affect the row""s inclusion in the materialized view.
Conventional incremental refresh approaches may also not support materialized queries that include rows from the unions of subqueries. A reason for this lack of support is that whether a row from a projected table is included in a materialized view may depend on multiple subqueries in the union of subqueries.
Based on the foregoing, it is clearly desirable to provide a system that incrementally refreshes a materialized view based on materialized view queries that include rows from the unions of subqueries or that specify a join between a projected table and another table along a one-to-many or a many-to-many relationship.
A framework for the incrementally refreshing a materialized view is provided. According to an aspect of the present invention, the materialized view is based on a query that references a projected table and another set of base tables. The query projects the columns of the projected table. To refresh the materialized view, a set of tuples is computed that identify rows to delete, insert, or otherwise modify in the materialized view in order to refresh it. The set of tuples is computed by computing a set of intersections, (1) one for the intersection between the query and the change log of the projected table, and (2) at least one other between the equijoin of the change log for one of the other base tables and the projected table. The query may define an equijoin between the projected table and at least one base table based on equijoin conditions that define a many-to-many relationship or a one-to-many relationship.