In a distributed database system, operations associated with a transaction can be submitted at multiple nodes or sites. The transaction is an atomic unit of work that is either fully submitted or entirely rolled back in the database for recovery and consistency reasons. Most transactional managers within a distributed database system implement some variation of a two-phase commit protocol. A two-phase commit protocol is a distributed algorithm that specifies when all nodes in a distributed system agree to commit a transaction. The protocol results in either all nodes committing the transaction or aborting, even in the case of network failures or node failures. The two phases of the algorithm are the commit-request phase, in which the coordinator attempts to prepare all the cohorts, and the commit phase, in which the coordinator completes the transactions. Each participating node in this scheme writes its local changes to its own transaction log and records a subsequent commit/abort record sent by the transaction manager to its transaction log.
In a log based replication scheme where distributed transactions between a source system and a target system are being asynchronously replicated for eventual failover in case of a disaster, a site/network/process failure may prevent successful propagation of changes from each participating source site that was involved in the distributed transaction to the target system. For efficiency reasons, the replication process at the target system may decide to commit changes arriving from each source site independently (i.e., as a non-distributed transaction), instead of reassembling the local work from each source site and submitting that as a distributed transaction using a two-phase commit protocol. In case of a failover to the target system, transactional consistency semantics require that the target database not reflect any partial distributed transactions. Therefore, replication must ensure receipt of all local units of work from each site that fully reflect the source side distributed transaction prior to submitting each local unit of work as a non distributed transaction. Alternately, replication must back out partially applied portions of a distributed transaction. In the absence of such methods, partial changes from one or additional (but not all) sites may be reflected in the target database system thereby breaking transactional consistency.
It would be desirable to provide improved techniques for log based replication of distributed transactions.