The invention relates to database systems and methods, and more particularly to a system and method for maintaining materialized views in a data warehouse and providing information in response to queries against the views.
As is well known, a data warehouse integrates information from one or more data sources into a database, which can be queried by clients or readers of the warehouse to extract useful information. The data warehouse typically includes materialized views, which are subsets of the database and are provided to speed up query processing as the readers can selectively work with, rather than the whole database itself, only those views suitably formatted for the readers"" particular queries.
As changes are made to certain data at the sources, the materialized views incorporating the previous version of such data become outdated. In response to the data changes, the data warehouse performs view maintenance to update the views. The view maintenance typically involves queuing the changes in the source data, and periodically running a batch update transaction, known as a xe2x80x9cmaintenance transaction,xe2x80x9d to refresh the materialized views. It should be noted that the periodic maintenance transactions typically are the only write-transactions at the warehouse, and all other transactions performed at the warehouse are read-only queries by the readers.
However, an xe2x80x9cinformation inconsistencyxe2x80x9d problem arises when a maintenance transaction is allowed to be performed during a reader session where a reader issues a series of queries, responses to which are based on certain common underlying data subject to the update by the maintenance transaction. That is, the information inconsistency problem arises when the response to a first query by the reader is based on a first version of the underlying data before the update, and the response to a second query by that reader is based on a second version of the same underlying data after the update. Such an xe2x80x9cinformation inconsistencyxe2x80x9d problem is particularly troublesome when the reader relies on the series of queries to perform an analysis requiring a consistent database state, which is common. Such an analysis is undesirably frustrated when the same underlying data changes from query to query.
Data in materialized views are stored in the form of tuples. The prior art approaches to affording information consistency typically avoid use of conventional two-phase locking algorithms, which require a reader to block any maintenance transaction when the reader attempts to read a tuple, and a maintenance transaction to block any reader when the transaction attempts to modify a tuple. This stems from the fact that such algorithms employ locks, e.g., read-locks and write-locks, to effect the blocking which are computationally expensive, and the reader session and maintenance transaction typically are long-running and access significant portions of the views, thereby undesirably triggering blocking frequently. As a result, a typical analysis involving numerous queries can hardly be effectively realized using the conventional locking algorithms.
A prior art approach avoiding use of locks requires that reader sessions take place only during the daytime while maintenance transactions are executed during the nighttime. However, as corporations become globalized, there is no longer any nighttime common to all corporate entities around the world during which it is convenient to make the data warehouse unavailable to readers. In addition, since the maintenance transactions must be completed by the following morning, the time available for view maintenance can be a limiting factor in both the number and size of the views that can be materialized at the warehouse.
However, in another prior art approach, a maintenance transaction is allowed to be executed on a materialized view concurrently with reader sessions utilizing the same view. For details on this approach, one may refer to: D. Quass et al., xe2x80x9cOn-Line Warehouse View Maintenance,xe2x80x9d Proc. SIGMOD, May 1997, pp. 393-404. Each tuple in the view contains multiple fields. The Quass et al. approach involves identifying those xe2x80x9cmutablexe2x80x9d fields in the tuple which are subject to an update as their data varies from time to time. During an update, to avoid the information inconsistency problem, a maintenance transaction overwrites the mutable fields if no reader is reading them. Otherwise, an updated version of the mutable fields is stored in spare memory space allocated to the view within the tuple, thereby creating multiple versions of the mutable fields in the view. In effect, each version of the mutable fields corresponds to a version of the view, and the multiple versions of the view corresponding to the multiple versions of the mutable fields are inextricably integrated with one another, and stored in the same memory space pre-assigned to the view. Queries of new reader sessions after the update are directed to the updated version of the mutable fields while queries of the on-going sessions continue to be directed to the earlier versions corresponding thereto.
While the Quass et al. approach described above is desirable in that it allows a maintenance transaction to be performed on a materialized view concurrently with reader sessions utilizing the same view without incurring an information inconsistency problem, its implementation has been recognized by me to be complicated in both reader and maintenance transactions. In addition, if a significant portion of the materialized view needs to be modified in each update, the handling of queries from the reader sessions concurrent with the materialized view in accordance with the Quass et al. approach becomes computationally expensive, and thus inefficient. I have recognized that such inefficiency is attributed to its requirement of integrating multiple versions of the mutable fields in the same relation holding the materialized view.
The present invention solves the above-identified problem. In accordance with the invention, when data in a materialized view is updated, a maintenance transaction for the update generates a new version of the entire view, which includes the updated data. Multiple versions of the view corresponding to different updates coexist. However, in direct contrast to the Quass et al. approach where using the above-described mutable fields, multiple versions of a view are inextricably integrated with one another and stored in the same memory space pre-assigned to the view, the invention requires that each version of the view in its entirety be physically separate and stored in a discrete memory space. Each version of the view in accordance with the invention is identified by an index. A reader session is associated with the most recent version of the view identified by the index at the beginning of the session. Queries in the reader session which are directed to the view are rewritten so that the responses thereto are based on the version of the view associated with the reader session. To avoid the information inconsistency problem, the maintenance transaction updates the view using a copy of the most recent version thereof, without disturbing any on-going reader sessions utilizing the original version. The resulting updated version becomes the most recent version of the view to which queries of new sessions, after the maintenance transaction completes, are directed.
In accordance with an aspect of the invention, for a materialized view which contains a large number of tuples which are immutable except for those tuples which are recently added thereto, the view is divided into an append-only part which includes those immutable tuples, and a mutable part which includes the remaining, mutable tuples. Advantageously, the inventive methodology described above may be applied to the mutable part to update such a view to avoid the information inconsistency problem.