In a relational database management system, data is stored in a multiplicity of tables having a multiplicity of rows (records), the rows having a multiplicity of columns (fields). A subset of the columns are designated as key columns and the combination of values of the key columns of the rows of a single table must be distinct. It is frequently desired to maintain copies (replicas) of a first table residing in a first database of the relational variety in one or more other databases of the relational variety. Furthermore, it is desired that changes (inserts, deletes, and updates) to rows of the table in the first database be copied (replicated) to the table copies residing in the other databases. Additionally, it is sometimes desired that the changes made to any of the table copies residing in any of the several relational databases be copied (replicated) to all the other table copies.
The propagation of changes made to one copy of the table may be synchronous or asynchronous to the original change. Synchronous propagation makes changes at all copies as part of the same transaction (unit of work) that initiates the original changes. Asynchronous propagation copies the original changes to the other table copies in separate transactions, subsequent to the completion of the transaction initiating the original changes. Synchronous change propagation requires that the database management systems maintaining all (or most) copies be active and available at the time of the change. Also, synchronous change propagation introduces substantial messaging and synchronization costs at the time of the original changes.
The means of detecting changes to be propagated asynchronously can be active or passive. Active change detection isolates the changes, at the time of the change, for later processing using database triggers or a similar mechanism. Passive change detection exploits information from the database recovery log, where changes are recorded for other purposes, to deduce what rows, of which tables, were changed as well as both the old and new values of changed columns.
In a typical database environment, there are varying levels of parallel transactional processing, involving concurrent transactions that execute read and write actions against database information. Fundamental to the nature of a data replication process is the choice of how to move, order and apply that stream of parallel database event changes to a target database.
In ensuring convergence of data in asynchronous replication, the secondary unique data constraints and referential integrity of the data need to be preserved. A secondary unique constraint is a constraint on a column of a table, other than the set of primary key columns that constrains the rows of the table such that this set of columns are unique within each row. A secondary unique constraint can span multiple columns in the same table. Referential integrity (RI) is the state in which all values of all foreign keys at a given database are valid. A referential constraint is the rule that the non-null values of a foreign key are valid only if they also appear as values of a parent key. The table that contains the parent key is called the parent table of the referential integrity constraint, and the table that contains the foreign key is a dependent (child) of that table.
RI constraints can be defined between tables at the source and/or at the target. They provide a mechanism to give more semantic to the data stored. For example, they ensure that the value of a group of columns match the value of a similar group of columns in another table. It is also possible to attach behavior to such constraints. For example when a row is deleted from a table and if it is referenced in other tables, the referencing rows are deleted too. Also unique constraints can be defined on a group of columns such that only one row in the table can have a given set of column values.
One conventional approach of data replication preserves the data constraints, but the transactions are not applied in parallel. Thus, the performance advantage realized from parallel apply of transactions is sacrificed. Another conventional approach uses a proprietary mechanism in the database engine to manage RI constraints during parallel apply of transactions, and is restrictive in what transactions may be applied in parallel. Thus, this approach is inflexible and not extensible.
Accordingly, there exists a need for an improved method and system for preserving data constraints during parallel apply in asynchronous transaction replication in a database system. The improved method and system should preserve secondary unique constraints and referential integrity constraints while also allowing a high degree of parallelism in the application of asynchronous transactions. The improved method and system should also handle ordering problems introduced by referential integrity cascade deletes and allow the parallel initial loading of parent and child tables of a referential integrity constraint. The present invention addresses such a need.