The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems identified with an approach should not be assumed to have been recognized in the prior art, unless otherwise indicated.
In a Database Management System (DBMS), data is stored in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In relational database systems, the data containers are referred to as tables, the records are referred to as rows, and the fields are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the fields are referred to as attributes. Other database architectures may use other terminology.
The present invention is not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms “table”, “row” and “column” shall be used herein to refer respectively to the data container, record, and field.
A group of changes to a database managed by a DBMS that must be made together are often referred to as a “transaction”. A group of changes that must be made together are referred to as being performed “atomically”. In performing a transaction, in many database systems, the changes made to the database prior to all the changes of the transaction have been performed are stored in a log, so that if it is determined that the transaction cannot be completed, the changes of the transaction that have been performed may be undone. A transaction in which all of its changes have been successfully completed may be said to have “committed”. The log used for storing the changes associated with the transaction before the transaction has committed, is often referred to as the “undo log”.
Views and Materialized Views
When a database management system contains very large amounts of data, certain queries against the database can take an unacceptably long time to execute. A view is a predefined query that is treated as a table. A view is a virtual table, and a view may be referenced by a database statement as if the view were an actual table. The tables in which the data is actually stored are base tables. The base tables are referenced directly or indirectly by a predefined query. To generate the rows of a view data is extracted and derived from the base tables.
A Materialized View (MV) is a view for which the results of the predefined query are stored as precomputed values. By materializing MVs, in this way, queries that take a long time to compute do not have to be executed multiple times, and consequently the data can be accessed faster and more efficiently.
In a database, operations on the base tables from which the MV was created do not directly affect the MV. Thus, after performing one or more database operations an MV may need to be refreshed to reflect the changes made to the database. To refresh an MV a refresh expression may be executed. A refresh expression is one or more database statements, executed to refresh an MV, that may either remove rows of an MV that need to be updated, and/or recompute and insert rows. The updated data may be taken from other tables. A “rewrite” is refreshing a portion of a first MV (e.g., using a refresh expression) by using a portion of a second MV. The second MV is one of the tables used in the refresh expression of the first MV. During the rewrite, portions of base tables may or may not be used in addition to the portions of the second MV for refreshing the first MV. Typically, whenever the base tables of an MV are updated, the MV is marked as stale until the MV is refreshed. The staleness of an MV affects how the MV is used by the database system. For example, through a process known as query rewrite, a query can be optimized to recognize and use existing MVs that could answer the query. The query rewrite engine will not use a stale MV unless there exists a setting for tolerating stale MVs, and the setting is also set so that stale MVs are tolerated.
Each MV may have its own refresh method. The Data Definition Language (DDL) statement that defines the MV may determine which of the refresh methods to apply. An MV can be refreshed to reflect the latest data changes ON DEMAND or ON COMMIT. When refreshing ON DEMAND, the user controls the refresh of MVs by choosing when to perform each refresh. Often, a choice of procedures for refreshing ON DEMAND are provided. For example, the user may be able to choose a procedure that selects one or more MVs to refresh, a procedure that refreshes those MVs that depend on a particular specified table, or a procedure that refreshes all MVs. If the user chooses to refresh ON COMMIT, the MV will automatically be refreshed on the commit of any transaction that updates the base tables referenced by the MV.
Scheduling Refresh Operations
Typically, when refreshing MVs, all of the MVs that need to be refreshed are submitted for refresh at essentially the same time, without performing a dependency analysis or attempting to balance loads. Submitting an MV for refresh may be referred to as “firing” the MV. Consequently, there may be a first MV, M1, which can be refreshed more efficiently after the refresh of a second MV, M2, because the refresh expression of M1 may be rewritten against M2. However, since the MVs are scheduled for refresh in an arbitrary order, M2 may get scheduled before M1. Additionally, a number of large MVs may be scheduled at the same time, thereby, causing the memory required by the queries to spill to disk and slow the execution of the queries. In some cases, the slowing of the execution and/or the spilling over to disk may lead to failures.
Various Ways of Refreshing MVS
There are a variety of different ways for refreshing MVs, such as a complete refresh, a fast refresh, or a forced refresh. In a complete refresh, the entire MV is rebuilt from scratch. In a fast refresh, only the data changes are applied to refresh the MV. A fast refresh may be referred to as an incremental refresh. Similarly, the changes to the MV during the fast refresh may be said to be applied “incrementally”, because, in a fast refresh, a subset of the rows of the MV are replaced rather than re-computing the entire MV. In a forced refresh, first a check is performed to see if a fast refresh is possible. If a fast refresh is possible, the fast refresh is performed. If a fast refresh is not possible, a complete refresh is performed.
There are several varieties of fast refresh, such as conventional fast refresh, Partitioned Change Tracking (PCT) fast refresh using TRUNCATE, and PCT fast refresh using DELETE. In a conventional fast refresh, MV logs are kept for each of the base tables. The MV logs are used to construct the changes on the MV that correspond to the changes made on the base tables. Undo logs may be used as the MV logs. However, the MV logs are stored in a manner that facilitates queries. The MV logs track changes made to the base table that are relevant to the MV. To find which rows to apply the computed changes, the changes that need to be applied to the MV are joined to the original MV. If the number of rows being changed is relatively few, the cost of performing the conventional refresh may be relatively low, because only the rows changed are operated upon. However, if there are many rows that are being changed, the operation may have a high cost.
In any of the PCT fast refresh methods, at least one of the base tables has been partitioned, and the database tracks the changes to the partition base table at the partition level. In a PCT fast refresh, the rows to process are restricted using simple predicates on only the relevant partitions, reducing the computational cost of refreshing the MV. In order to perform either type of PCT fast refresh, at least one of the base tables for the MV must be partitioned.
The PCT fast refresh using TRUNCATE erases an entire partition within the MV, and recalculates or rewrites each row of the partition that was erased. No log is kept of the records in the partition that are erased during the truncation. The truncation operation is typically irreversible. In order to perform a PCT fast refresh using TRUNCATE, the MV must be partitioned in a manner related to the partitions in the base tables. The truncation operation is not dependent on the number of rows erased, and is fast relative to a row-by-row delete operation. However, the recalculation of the all the rows deleted may have a high cost.
The PCT fast refresh using DELETE deletes selected rows of an entire partition within the MV, and recalculates or rewrites the rows deleted. The PCT fast refresh using DELETE uses a delete operation, which enters each deleted row in a log of the operations performed. The PCT fast refresh using DELETE does not require the MV to be partitioned.
Each of the fast refresh methods are possible only in certain scenarios. Specifically, a conventional fast refresh is possible if the only operations performed on the base tables were DML operations specified in database statements. For example, if Partition Maintenance Operations (PMOPs) (e.g., adding, dropping, splitting, and merging partitions), were performed on the base tables, a conventional fast refresh is not possible, because if any PMOPs were performed, the base table changes are not available in the MV logs. Thus, if a PMOP occurred, only a PCT fast refresh or complete refresh may be performed. PCT fast refresh can be used both in the presence of DMLs on base tables and PMOPs on partitioned base tables.
During a PCT fast refresh with TRUNCATE, the TRUNCATE causes the transaction to commit. Consequently, following a TRUNCATE, any refresh operation that needs to be done will be in a separate transaction. Hence, PCT refresh with TRUNCATE cannot be done in one transaction and is therefore not atomic. Consequently, a PCT refresh is not possible for an “atomic” refresh.
In one manner of implementing a fast refresh, if an MV needs to be refreshed, the decision of whether to choose a conventional fast refresh, PCT fast refresh using TRUNCATE, PCT fast refresh using DELETE, or a complete rebuilding of the MVs is based on heuristic rules. The refresh options are prioritized in the following order: (1) conventional fast refresh (2) PCT fast refresh with TRUNCATE (3) PCT fast refresh with DELETE, and (4) a complete rebuild. The heuristic rule checks to see if the refresh method with the highest priority could be performed. If the refresh method with the highest priority could not be performed, the refresh method with the next highest priority is checked to see if it could be performed. In other words, each refresh method is checked to see if it could be performed in the order of priority until the refresh method with the highest priority that could be performed is found.
For example, using this prioritization of refresh techniques for the heuristic rule, first a check is performed to see if a conventional fast refresh is possible. If a conventional fast refresh is possible, it is performed. If, for example, a PMOP was performed, the conventional fast refresh cannot be used, a check is made to see if PCT fast refresh with TRUNCATE can be performed. If the PCT fast refresh with TRUNCATE can be performed, it is. However, if, for example, the MV is not appropriately partitioned, PCT fast refresh with TRUNCATE cannot be used, and therefore, a check is performed to see whether a PCT fast refresh with DELETE can be used. If the PCT fast refresh with DELETE can be performed, it is. If, for example, the base tables are not partitioned or for some other reason the PCT fast refresh with DELETE also could not be performed, then the MVs would be recreated from scratch.
Unfortunately, however, the heuristic rule does not necessarily result in the most optimal refresh technique.
In view of the above it is desirable to lower the cost of refreshing MVs.