The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems identified with an approach should not be assumed to have been recognized in the prior art, unless otherwise indicated.
In a Database Management System (DBMS), data is stored in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational database systems, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
The present invention is not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms “table”, “row” and “column” shall be used herein to refer respectively to the data container, record, and field.
A group of changes to a database managed by a DBMS that must be made together are often referred to as a “transaction”. A group of changes that must be made together are referred to as being performed “atomically”. In performing a transaction, in many database systems, the changes made to the database prior to all the changes of the transaction have been performed are stored in a log, so that if it is determined that the transaction cannot be completed, the changes of the transaction that have been performed may be undone. A transaction in which all of its changes have been successfully completed may be said to have “committed”. The log used for storing the changes associated with the transaction before the transaction has committed, is often referred to as the “undo log”.
Databases use various types of logical storage units to store rows of a table, each type corresponding to a level of granularity. Typically, the logical storage unit at the lowest level of granularity is a data block. In database systems that support table partitioning, a logical storage unit at a higher level of granularity than a datablock is a “table partition”.
In table partitioning, a database table is divided into sub-tables, which are table partitions. One form of partitioning is referred to as range partitioning. With range partitioning, each individual partition corresponds to a particular range of values for one or more columns (partition keys) of the table. All rows that fall into the range associated with a partition are entirely contained within that partition.
Views and Materialized Views
When a database management system contains very large amounts of data, certain queries against the database can take an unacceptably long time to execute. A view is a response to a predefined query that is treated as a table. A view is a virtual table, and a view may be referenced by a database statement as if the view were an actual table. The tables in which the data is actually stored are base tables. The base tables are referenced directly or indirectly by a predefined query. To generate the rows of a view data is extracted and derived from the base tables.
A Materialized View (MV) is a view for which the results of the predefined query are stored as precomputed values. A MV is the stored version of the virtual table created by a view. By materializing MVs, in this way, queries that take a long time to compute do not have to be executed multiple times, and consequently the data can be accessed faster and more efficiently.
Operations on the database containing the base tables from which the MV was created do not directly affect the MV. Thus, after performing one or more database operations a MV may need to be refreshed to reflect the changes made to the database.
Each MV may have its own refresh method. The query that defines the MV may determine which of the refresh methods to apply. To refresh an MV a refresh expression may be executed. A refresh expression is one or more database statements, executed to refresh an MV, which may either remove rows of an MV that need to be updated, and/or recompute and insert rows. The updated data may be taken from other tables. A rewrite is refreshing a portion of a first MV (e.g., using a refresh expression) by using a portion of a second MV. The second MV is one of the tables used in the refresh expression of the rewrite. During the rewrite portions of base tables may or may not be used in addition to the portions of the second MV for refreshing the first MV.
Various Ways of Refreshing Mvs
There are a variety of different ways for refreshing MVs that are often provided, such as a complete refresh, a fast refresh, or a forced refresh. In a complete refresh, the entire MV is rebuilt from scratch. In a fast refresh, only the data changes are applied to refresh the MV. A fast refresh may be referred to as an incremental refresh. Similarly, the changes to the MV during the fast refresh may be said to be applied “incrementally”, because, in a fast refresh, a subset of the rows of the MV are replaced rather than re-computing the entire MV.
There are several varieties of fast refresh, such as conventional fast refresh, Partitioned Change Tracking (PCT) fast refresh using TRUNCATE, and PCT fast refresh using DELETE. In a conventional fast refresh, MV logs are kept for each of the base tables. The MV logs are used to construct the changes on the MV that correspond to the changes made on the base tables. Undo logs may be used as the MV logs. However, MV logs are typically stored in a manner that facilitates being queried. The MV logs track changes made to the base table that are relevant to the MV. To find which rows to apply the computed changes, the changes that need to be applied to the MV are joined to the original MV.
In any of the PCT fast refresh methods, at least one of the base tables has been partitioned, and the MV logs track the changes to the partition base table at the same hierarchical level as the partition. A data object, such as a table, may be divided up into multiple sub-tables referred to as “partitions” that are stored and indexed separately. In a PCT fast refresh, the rows being modified may be found using simple predicates on only the relevant partitions, which does not cost as much as the joins necessary for computing a conventional fast refresh. In order to perform either type of PCT fast refresh, at least one of the base tables for the MV must be partitioned.
The PCT fast refresh using TRUNCATE erases an entire partition within the MV, and recalculates or rewrites each row of the partition that was erased. No log is kept of the records in the partition that are erased during the truncation. The truncation operation is typically irreversible. In order to perform a PCT fast refresh using TRUNCATE, the MV must be partitioned in a manner related to the partitions in the base tables (in addition to having one of the base tables partitioned). The truncation operation is not dependent on the number of rows erased, and is fast relative to a row-by-row delete operation. However, the recalculation of the all the rows deleted may have a high cost.
The PCT fast refresh using DELETE deletes selected rows of an entire partition within the MV, and recalculates or rewrites the rows deleted. The PCT fast refresh using DELETE uses a delete operation, which enters the deleted row in a log of the operations performed. The PCT fast refresh using DELETE does not require the MV to be partitioned.
Each of the fast refresh methods are possible only in certain scenarios. Specifically, a conventional fast refresh is possible if the only operations performed on the base tables were DML operations specified in database statements. For example, if Partition Maintenance Operations (PMOPs) were performed on the base tables, a conventional fast refresh is not possible, because if any PMOPs were performed, the base table changes are not available in the MV logs. In PMOPs are operation that update an index to reflect changes to a table upon which the index is based. If a PMOP occurred only a PCT fast refresh or complete refresh may be performed. PCT fast refresh can be used both in the presence of DMLs on base tables and PMOPs on partitioned tables.
However, MV logs may be large and take a significant amount of time to search. Additionally, PCT fast refresh involves joining the change in the MV to the MV, which may be computationally expensive. Thus, a PCT fast refresh can take a significant amount of time, because of the computational expense associated with searching the MV log tables, computing the changes to the MV from the MV logs, and computing the join operations for joining the change in the MV to the MV.
In view of the above, it is desirable to lower the computational cost of performing PCT fast refresh operations.