1. Field of the Invention
The present invention generally relates to materialized views and, more particularly, to the incremental refresh of the materialized view.
2. Brief Description of Related Developments
The materialized view is an Oracle object that records valuable summary data in the data warehouse environment. Extracting summary data from a data warehouse often requires a time-consuming and expensive process through complex queries with join and/or aggregate constructs. As more periodical changes made to the tables in the data warehouse, maintaining summary data in the materialized views becomes an increasingly important topic. It is not practical to always rebuild the materialized views from scratch (i.e., complete refresh) due to the expensive nature.
The materialized view maintains pre-calculated data resulting from highly aggregated/joined queries. It has been increasingly and widely used in the data warehouse application to achieve high-performance query processing in accessing huge amount of historical data. Its values can be seen in the two aspects as query rewrite and incremental refresh. The query rewrite mechanism achieves better query processing performance by rewriting the given query in terms of the materialized view. The response time is thus shortened by using pre-calculated results in the materialized view. On the other hand, the incremental refresh provides a faster mechanism to synchronize the data between the base tables and the materialized view.
In the past, the materialized view was “atomic and treated as an unseparatable object so that the status of its data can only be either globally fresh or global stale. It is a serious drawback that affects the availability of the materialized view. For example, when any of the base tables of the materialized view gets updated, the materialized view immediately becomes globally “stale”. All the data in the materialized view becomes distrusted and unusable in the query rewrite (unless lowering the confidence level) since there is no easy way to identity which part of the data gets affected by the changes. Later, with the partition change tracking technique, the staleness of the materialized view is separated at the partition level.
In the materialized view maintenance, the row-based incremental refresh (using log with changed data rows) was the only data synchronization approach without needing to recalculate the data of the materialized view (i.e., complete refresh). However, the row-change-based incremental refresh has its limitations. For example, when the base table is partitioned (e.g., by range) and a partition maintenance operation (e.g., exchange partitions) occurs, the existing row-based incremental refresh is unable to handle such partition change. As a result, the only way to bring the materialized view back to the fresh state is through the complete refresh that is very expensive especially with a large amount of data. It is noted that the use of data partitioning becomes even more popular and is increasingly adopted in the OLAP system and very suitable for managing historical data. Lacking the ability of incrementally refreshing the materialized view after partition maintenance operations is a serious disadvantage.
The incremental refresh approach to materialized view maintenance incrementally applies the changes in the tables of the data warehouse to refresh the materialized views (i.e., incremental refresh). To enable the incremental refresh, catching change data made through various operations including conventional operations (e.g., INSERT, UPDATE and DELETE), direct path load and PMOP (partition maintenance operation) is the fundamental step. To catch conventional DML operations to the base table of the materialized view, a materialized view log is created and associated with the base table. When the base table is changed by conventional DML operations, the information of changed rows is recorded in the materialized view log. During the incremental refresh of the materialized view, the change data recorded in the materialized view log is used to determine what changes need to be made to the materialized view. Therefore, the materialized view log has been a mandatory requirement for the materialized view supporting incremental refresh. Without the materialized view log, the log-based incremental refresh is not possible.
As mentioned previously, a partition-based (“PCT”) incremental refresh approach was proposed to refresh the materialized view using partition change information and partition change tracking. The refresh algorithm for the partition-based incremental refresh does not require the materialized view log if the materialized view meets PCT (Partitioning Change Tracking) requirements and only PCT base tables are updated. However, the PCT incremental refresh cannot handle the case when the non-partitioned base table changes. In the general case when both non-partitioned and partitioned base tables are changed and partitioned base tables have partition maintenance operations, no existing incremental refresh method is available to use. Also, the current log-based (FAST) refresh method heavily relies on the materialized view logs of all base tables, but recording and maintaining those logs cause huge processing overheads during DML operations and materialized view refresh, respectively. Removing the dependency on the materialized view log is a very difficult refresh issue.