1. Field of the Invention
The present invention relates to a method, system, and computer program product for batching transactions to apply to a database.
2. Description of the Related Art
In a relational database management system (RDBMS), data is stored in one or more tables having multiple rows (records), where each row has one or more columns (fields). A subset of the columns are designated as key columns and the combination of values of the key columns of the rows of a single table must be distinct.
Database replication is a distributed technology used to asynchronously maintain one or more copies of the same data. The origin of the data is known as the “source”, and the receiver of the copy is the “target”. In an RDBMS, data is stored in multiple tables that are logically related based on business rules. Replication of a database is performed to maintain copies (replicas) of source tables residing in a source database in corresponding target tables in one or more target databases. As part of replication, changes (inserts, deletes, and updates) to rows of the source tables in the source database that is subject to replication are copied to the target table in a target database at one or more replication sites.
The propagation of changes made to a target database may be synchronous or asynchronous to the original change. Synchronous propagation copies changes to the source tables to target tables as part of a single transaction (unit of work). If the application of any transaction fails, then the entire transaction rolls back. Asynchronous propagation stores changes to the target tables in a queue and propagates and applies those changes to the target database. Synchronous change propagation requires that the database management systems maintaining all (or most) copies be active and available at the time of the change. Also, synchronous change propagation introduces substantial messaging and synchronization costs at the time of the original changes.
The means of detecting changes to be propagated asynchronously can be active or passive. Active change detection isolates the changes, at the time of the change, for later processing using database triggers or a similar mechanism. Passive change detection exploits information from the database recovery log, where changes are recorded for other purposes, to deduce what rows, of which tables, were changed as well as both the old and new values of changed columns.
Replication also maintains the integrity of database transactions. A database transaction is a unit of work, with one or more insert, update or delete row operations to one or more tables, and is independent of other transactions. Transactional integrity is enforced when the transaction's unit of work is either applied in its entirety or has no effect whatsoever when not applied.
Database transactions are considered dependent on each other, if they modify the same row(s) or if they modify rows that are related via RDBMS-enforced rules, such as referential integrity or uniqueness constraints. For example, a transaction might be updating the record for an employee that was inserted by another transaction. The replication detects such dependencies by inspecting the values of the data before applying the data.
In a replication process, database integrity is preserved when dependent transactions are committed in the order in which they were committed at the source, or source-commit-order. Non-dependent transactions can be committed in any order at the target, without impacting database integrity. For example, two transactions modifying unrelated customer accounts in a purchasing system can be replicated at different speed without impacting overall database integrity. The impact is that a query for fetching updated customer information cannot be routed to the target database until replication has replicated all dependent transactions for this customer's information to the target. The ability to re-execute transactions out-of-order is needed for scalability of the replication apply process, very large numbers of independent agents can be used and often allow to execute workloads at the target faster than they were executed at the source.
In a typical database environment, there are varying levels of parallel transactional processing, involving concurrent transactions that execute read and write actions against database information. Fundamental to the nature of a data replication process is the choice of how to move, order and apply that stream of parallel database event changes to a target database. One conventional approach provides a certain degree of apply parallelism by grouping related tables into distinct sets and having each set of tables applied by a completely separate program. In another conventional approach, parallelism is provided but without preserving the source data event order. Thus, to provide data integrity, a “shadow” table is used to track and maintain each individual data row change. This approach, however, has a significant overhead cost in both making updates and in performing lookups against the shadow table.