In a relational database management system, data is stored in a multiplicity of tables having a multiplicity of rows (records), the rows having a multiplicity of columns (fields). A subset of the columns are designated as key columns and the combination of values of the key columns of the rows of a single table must be distinct. It is frequently desired to maintain copies (replicas) of a first table residing in a first database of the relational variety in one or more other databases of the relational variety. Furthermore, it is desired that changes (inserts, deletes, and updates) to rows of the table in the first database be copied (replicated) to the table copies residing in the other databases. Additionally, it is sometimes desired that the changes made to any of the table copies residing in any of the several relational databases be copied (replicated) to all the other table copies.
The propagation of changes made to one copy of the table may be synchronous or asynchronous to the original change. Synchronous propagation makes changes at all copies as part of the same transaction (unit of work) that initiates the original changes. Asynchronous propagation copies the original changes to the other table copies in separate transactions, subsequent to the completion of the transaction initiating the original changes. Synchronous change propagation requires that the database management systems maintaining all (or most) copies be active and available at the time of the change. Also, synchronous change propagation introduces substantial messaging and synchronization costs at the time of the original changes.
The means of detecting changes to be propagated asynchronously can be active or passive. Active change detection isolates the changes, at the time of the change, for later processing using database triggers or a similar mechanism. Passive change detection exploits information from the database recovery log, where changes are recorded for other purposes, to deduce what rows, of which tables, were changed as well as both the old and new values of changed columns.
In a typical database environment, there are varying levels of parallel transactional processing, involving concurrent transactions that execute read write actions against database information. Fundamental to the nature of a data replication process is the choice of how to move, order and apply that stream of parallel database event changes to a target database.
One conventional approach provides a certain degree of apply parallelism by grouping related tables into distinct sets and having each set of tables applied by a completely separate program. However, this approach places a heavy burden the user, who may have difficulty knowing which tables are logically related and must be grouped together.
In another conventional approach, parallelism is provided but without preserving the source data event order. Thus, to provide data integrity, a “shadow” table is used to track and maintain each individual data row change. This approach, however, has a significant overhead cost in both making updates and in performing lookups against the shadow table.
Other conventional approaches provide parallelism but by using a very proprietary way that has no or limited applicability outside of a specific system.
Accordingly, there exists a need for an improved method for providing parallel apply in asynchronous data replication in a database system. The improved method and system should provide a high speed parallel apply of transactional changes to a target node such that the parallel nature of the application of changes does not compromise the integrity of the data. The improved method and system should also require significantly less overhead than conventional approaches and be easily adaptable to various types of database systems. The present invention addresses such a need.