A data warehouse collects information from several possibly widely distributed and loosely coupled source databases. The collected information is integrated into a single database to be queried by the data warehouse clients.
Because a data warehouse is primarily used for decision support, queries at the data warehouse tend to be complex and take a long time to execute. Due to the huge volume of data and the complexity of the queries, a data warehouse is typically managed by a parallel database management system (DBMS). In addition to the data itself, the parallel DBMS typically includes data structures of various types that facilitate efficient query processing.
The data itself is organized in tables, also known as relations. Various data structures may be associated with these relations. For example, a view is a derived relation formed by performing a function on one or more base relations. Rather than storing the view, the function is typically recomputed each time the view is referenced.
A materialized view is another type of data structure used to speed up query processing. A materialized view is a pre-computed, stored query result that can be used for some queries instead of reconstructing the results directly from the base relations. As with the view, a function is performed on the base relations to derive the materialized view. Because the materialized view is stored, fast access to the data is possible without recomputing the view.
The materialized view is thus an assembly of data for fast access. After the materialized view is created, subsequent queries may use the materialized view, where appropriate, saving the overhead of performing the computation again. Materialized views may be used to assemble data that come from many different relations, which typically requires many different join operations to be performed between the relations.
Much like a cache, a materialized view is updated when the underlying base relations are modified. A typical materialized view may consist of a join between two or more base relations. The primary copy of the data is kept in the base relations. As these relations are changed through insertion of new tuples, deletion of tuples, or updates to existing tuples, the corresponding rows in the materialized view are changed to avoid becoming stale. This is known as materialized view maintenance.
The maintenance process may be expensive (in terms of resource utilization) on a parallel database management system. It is often the case that the information (tuples) for maintaining the materialized view is not stored together on a single node of the system. Thus, to perform a maintenance operation, the system collects the affected tuples together from several nodes, performs the join, and updates the materialized view. Maintenance of the materialized view may thus adversely affect the speed of query processing.
In a data warehouse environment, updates typically arrive in bulk, rather than a few at a time. Ostensibly, this bulk updating of data improves efficiency of the DBMS. In most commonly used commercial data warehousing systems, the data warehouse is not available for other queries during materialized view maintenance.
As international corporations have branches that span multiple time zones, there is sometimes no convenient “night” down time for the data warehouse to be maintained while blocking query requests. Moreover, in the world of e-commerce, there may be no down time.
Recently, real-time data warehousing has emerged. A real-time data warehouse is event driven, reacts in a timeframe appropriate to the business need, and makes rapid operational decisions or causes prompt operational actions. The real-time data warehouse must be available at all times to provide quick responses to interactive requests. Both updates to base relations and maintenance of materialized views ideally occur at least daily (and often hourly, every ten minutes, or even continuously). For a real-time data warehouse, therefore, materialized views need to be updated without inhibiting queries in progress.