1. Field of the Invention
This invention relates generally to computer systems for managing changes to collections of data and, more particularly, to systems for propagating change histories to data base copies.
2. Description of the Related Art
A relational data base comprises a collection of tables containing data. A relational data base manager permits users and applications software to create, change, or destroy the data in the tables, the relationships between the tables, and even the structure of the tables. A query language, such as SQL, is used to manipulate the data base and retrieve query results. Many relational data bases are transaction-oriented, meaning that a data table change comprises a transaction that involves two or more related activities. For example, in a transaction-oriented banking application, the first activity might comprise a debit to a table and a second activity might comprise a credit to another table, followed by an SQL commit command that completes the transaction by implementing the first two activities to thereby change the affected tables. The first and second activities are not actually implemented until there is a commit command. Thus, the transaction is treated in an "all-or-nothing" manner and is referred to as an atomic unit of work. Preserving such atomicity, or completeness of operation, ensures integrity of the tables.
In many data base systems, it is desirable to maintain an accurate history of the changes to the data tables. The history can be used, for example, to recover from system failures or equipment malfunctions by providing a record of the transactions that can be read for recreating the state of the tables at any point in time. Such systems include processes, or programming routines, that create change histories by writing changes to an activity log in the order in which they are received. Thus, several transactional changes might be received before a commit command is received for the first transaction. Some transactions might not ever be committed and instead might be aborted.
An activity log contains the minimum amount of information necessary for data recovery. Entries in an activity log, for example, might comprise simply "before" and "after" copies of a data table row, or might comprise very specific change operations that define the row and column of a data table for which a change is made and define the changed value. Location markers in the activity log likely refer to internal computer locations, rather than table or variable names. Because activity logs are designed for recovery operations and not analysis operations, it can be rather difficult to recreate the data base changes to permit useful analysis of the change history.
Many relational data base management systems make copies of selected portions of a data base, such as when returning the response to a user SQL query. These copies are often referred to as "snapshots" of the data base, because they reflect the condition of the data base at a particular point in time. Snapshots are made periodically for a variety of purposes. The system may make copies for use by system processes in implementing data base changes. Snapshots also are used to create additional copies of data tables for concurrent use by multiple users. Snapshots are read-only tables that are created out of the original source data base tables and cannot be changed. After a snapshot has been produced, it can be refreshed to reflect all changes to the original user data table since the creation of the initial snapshot or the time of the last refresh operation.
It is known to reduce the amount of information necessary to refresh a snapshot. For example, rather than returning to the source user tables to replace an initial snapshot with a complete, updated copy of a user table, a data base management system can refresh an initial snapshot by propagating update information received since the initialization or the last prior update through the snapshot copies. Such information can be obtained from the activity log, but is lost after being applied, or propagated, to the first snapshot. Therefore, updated snapshot copies cannot be used to update other snapshots. It would be necessary to repeatedly consult the activity log and apply the recorded updates to each initial snapshot copy to be updated. As noted above, this can be a difficult task because activity logs are designed primarily for recovery operations.
Activity logs often are not useful for user analysis of change history because they are not accessible to a user through an interface such as SQL or at least contain information that is not designed for easy manipulation. This is because activity logs are designed for recovery operations, not access and manipulation operations. Thus, even if the activity log data were readily available to users for interactive query, most users would not find the data to be meaningful.
For example, time series data does not accompany subsequent snapshot copies. As a result, it can be virtually impossible to locate the proper starting point in the activity log to recreate the changes to the data base in the order in which the changes were committed. Snapshot copy deletion operations are permanent, so that knowledge of deletions is lost from subsequent copies. Knowledge of deletions would be necessary to maintain subsequent deletion operations with the minimal information technique. Thus, snapshot copies that are refreshed are not useful as sources for refreshing subsequent snapshots, referred to as multi-generational copies, with the same minimal information technique. As a result, for complete information, refreshed snapshots must be obtained directly from the original data table and activity log. This places a heavy burden on the data base manager to participate in the refresh of every snapshot.
From the discussion above, it should be apparent that there is a need for a system that permits multiple copies of data tables to be made specifying only the changes to a base copy of the table without losing change information from copy to copy. The present invention satisfies this need.