A database is a collection of stored data that is logically related and that is accessible by one or more users. A popular type of database is the relational database management system (RDBMS), which includes relational tables made up of rows and columns. Each row represents an occurrence of an entity defined by a table, with an entity being a person, place, or thing about which the table contains information. To extract data from, or to update, a relational table, queries according to a standard database query language (e.g., Structured Query Language or SQL) are used. A table (also referred to as a relation) is made up of multiple rows (also referred to as tuples). Each row (or tuple) includes multiple columns (or attributes).
Various other data structures are also typically associated with relations in a relational database system. For example, a view is a derived relation formed by performing a function on one or more base relations. Rather than storing the view, the function is typically recomputed each time the view is referenced. This type of view is referred to as an “ordinary view.”
Unlike an ordinary view, a materialized view is a pre-computed, stored query result that can be used for some queries instead of reconstructing the results directly from the base relations. As with the ordinary view, a function is performed on the base relations to derive the materialized view. However, because the materialized view is stored, fast access to the data is possible without recomputing the view.
After the materialized view is created, subsequent queries are able to use the materialized view, where appropriate, to increase query processing speed. Materialized views can be used to assemble data that come from many different relations. One type of view is the join view, which stores join results of multiple base relations.
A materialized view is updated when the underlying base relations are modified. As the base relations are changed through insertion of new tuples, deletion of tuples, or updates to existing tuples, the corresponding rows in the materialized view are changed to avoid becoming stale. This is known as materialized view maintenance.
Relational database systems can be used for data warehouse applications. A data warehouse collects information from several source databases. The collected information is integrated into a single database to be queried by the data warehouse clients.
Traditionally, data warehouses have been archival stores used for analysis of historical data. More recently, however, there has been a growing trend to use a data warehouse operationally (referred to as a “operational data warehouse” or “real-time data warehouse”), which involves making relatively real-time decisions about data stored in the data warehouse.
Traditional techniques of maintaining views are usually inadequate (in terms of processing speed) for operational data warehouses due to the real-time update requirements. Furthermore, materialized view maintenance in an operational data warehouse requires transactional consistency. If transactional consistency is enforced by traditional concurrency control mechanisms (including locking mechanisms), the ability of the database system to perform concurrent transactions may be reduced. This hurts performance in a database system, especially in a parallel database system having multiple processing modules.
When a base relation is updated (e.g., new row inserted, existing row deleted, or row modified), the update needs to be propagated to a materialized view as part of the materialized view maintenance. In some systems, to increase operational speeds, reduced levels of consistency are used that allow “dirty reads,” which are reads of stale data in relations. However, when such reduced levels of consistency are used in an environment in which materialized views are present, inconsistent query results are often obtained as a result of inaccurate data being captured in materialized views.