1. Field of the Invention
This invention relates to systems and methods for maintaining atomicity and reducing blocking in distributed systems.
2. Description of the Related Art
For a transaction to be atomic, a system either executes all of the operations in the transaction to completion or none of the operations. Atomicity allows multiple operations to be linked so that the final outcome of the overall transaction is known. System failures can prevent atomicity. For example, a device or communication failure in a distributed system executing a transaction can cause some of the parties participating in the transaction to execute the transaction to completion while other parties abort the transaction. This puts the parties in different states and can corrupt system information if the parties cannot roll-back to a stable condition consistent with a known state before the transaction was initiated.
In a distributed system, an atomic commit protocol (ACP) resolves transactions between a number of different parties involved in the transaction. The ACP ensures that all parties to the transaction agree on a final outcome by either committing to the transaction or aborting the transaction. Several such protocols are described below.
I. Deterministic Atomic Commit Protocol
A plurality of nodes may participate in a transaction and then send messages to each other to indicate that they are each prepared to commit the transaction. Once a particular participant receives “prepared” messages from all other participating nodes, the participant commits to the transaction and sends a “committed” message to the other participating nodes. If the participant receives an “abort” message from another participating node, the participant also aborts. Thus, the protocol in this example is deterministic in that the outcome of the transaction is causally determined when the participating nodes are prepared to commit. The transaction eventually commits when all participants successfully send “prepared” messages to the other participants. Each participating node uses this rule to decide for itself how to resolve the transaction.
However, failure of a participant can block the transaction until the participant recovers. If, for example, the participant prepares for the transaction but crashes before sending any “prepared” message, and all other participants send “prepared” messages, the transaction is blocked while the functioning participants wait to determine whether or not the failed participant prepared or aborted the transaction. Further, the functioning participants do not know whether or not the failed participant committed to the transaction after receiving their “prepared” messages. Thus, the functioning participants block the transaction until the failed participant recovers. The transaction may block for an indeterminate amount of time, which may be forever in the case of a permanent failure.
II. Two-Phase Commit Protocol
Some ACPs are non-deterministic and use a coordinator to manage the ACP and reduce blocking when a participating node fails. For example, in a conventional two-phase commit protocol the participants send “prepared” messages or “abort” messages to the coordinator rather than to each other. In a first phase, the coordinator decides whether to commit or abort the transaction. If the coordinator receives “prepared” messages from all participants, the coordinator decides to commit the transaction. If the coordinator receives an “abort” message from at least one participant, the coordinator decides to abort the transaction. In a second phase, the coordinator logs its decision and sends messages to the participating nodes to notify them of the decision. The participants can then take appropriate action.
Since the coordinator makes a unilateral decision, failure of a single participant will not block the transaction. If a participant fails or loses communication with the coordinator before sending a prepared or “abort” message, the coordinator unilaterally decides to abort after a predetermined amount of time. However, the two-phase commit protocol can still block the transaction under certain circumstances. For example, if the coordinator fails and all participants send “prepared” messages, the participants will block until the coordinator recovers and resolves the protocol.
III. Three-Phase Commit Protocol
Conventional three-phase commit protocols attempt to solve the blocking problem of the two-phase commit protocol by adding an extra phase in which a preliminary decision of whether to commit or abort the transaction is communicated to the participating nodes. If the coordinator fails, the participating nodes select one of the participants to be a new coordinator that resumes the protocol. When the failed coordinator recovers, it does so as a participant and no longer acts in the role of the coordinator. However, in many applications it is not practical to implement the conventional three-phase commit protocol. Further, the three-phase commit protocol may block if multiple participants fail or if there is a communication failure.