The present invention relates to the field of data replication.
“Bidirectional Database Replication” is specified as the application of database deltas (i.e., the results of transactions being performed against a database) from either of two databases in a pair to the other one. Transaction I/O (e.g., inserts, updates, and deletes) applied to one database are applied to the other database and vice-versa. Both databases are “live” and are receiving transactions from applications and/or end users. U.S. Pat. No. 6,122,630 (Strickler et al.), which is incorporated by reference herein, discloses a bidirectional database replication scheme for controlling transaction ping ponging.
In the database world, a collision is classically defined as a conflict that occurs during an update. A collision occurs when a client reads data from the server and then attempts to modify that data in an update, but before the update attempt is actually executed another client changes the original server data. In this situation, the first client is attempting to modify server data without knowing what data actually exists on the server. Conventional techniques for minimizing or preventing collisions include database locking and version control checking. These techniques are commonly used in systems that have one database, wherein many users can access the data at the same time.
When a database system includes replicated databases, the problem of collisions becomes greater, since clients may be requesting database changes to the same data at the same physical or virtual location or at more than one physical or virtual locations. Collision or conflict detection schemes have been developed for replicated database systems. After a collision is detected, a variety of options are available to fix or correct the out-of-sync databases. However, it would be more desirable to prevent collisions from happening in the first place.
One conventional distributed transaction scheme used in Oracle distributed database systems is known as the “two-phase commit mechanism.” This approach is classically used to treat a “distributed” transaction, i.e., a transaction that spans multiple nodes in a system and updates databases on the nodes, as atomic. Either all of the databases on the nodes are updated, or none of them are updated. In a two-phase commit system, each of the nodes has a local transaction participant that manages the transaction steps or operations for its node.
The two phases are prepare and commit. In the prepare phase, a global coordinator (i.e., the transaction initiating node) asks participants to prepare the transaction (i.e., to promise to commit or rollback the transaction, even if there is a failure). The participants are all of the other nodes in the system. The transaction is not committed in the prepare phase. Instead, all of the other nodes are merely told to prepare to commit. During the prepare phase, a node records enough information about the transaction so that it can subsequently either commit or abort and rollback the transaction. If all participants respond to the global coordinator that they are prepared, then the coordinator asks all nodes to commit the transaction. If any participants cannot prepare, then the coordinator asks all nodes to roll back the transaction.
A side effect of this scheme is often a degree of collision prevention. Prior to the prepare phase, locks are placed on the appropriate data and the data is updated, thereby preventing many types of collisions. For example, the well-known technique of “dual writes” can be used to lock and update the appropriate data. In this technique, the application originating the transaction (or a surrogate library, device, or process on behalf of the application) performs the local I/O changes and replicates the I/O changes as they occur and applies them directly into the target database. Typically, the application's individual I/O changes to the source database are “lock-stepped” with the I/O changes to the target database. That is, the local I/O change does not complete until the remote I/O change is also complete.
The scheme of using two phase commit with a technique such as dual writes (also referred to as “two phase commit” in this document) relies on a transaction coordinator for both local and remote database updating. If there are a large number of nodes in the system, the transaction coordinator must actively manage the updating of all of the other nodes. The node coordination puts large processing demands on the transaction coordinator and requires a large amount of messaging to occur throughout the system. Due to its messaging nature, the two phase commit mechanism is not used for efficient replication of distributed databases.
Accordingly, there is an unmet need for a collision avoidance scheme in a database replication system that is relatively simple to implement, efficiently uses communication medium, scales efficiently and easily, prevents all types of collisions, and which does not place large demands on local application programs to perform complex node coordination duties.
There is also an unmet need for methods to determine when to switch replication systems that normally operate in a synchronous mode to an asynchronous mode, and subsequently back to a synchronous mode.
The present invention, also referred to as “coordinated commits” in the subsections below, fulfills these needs.