Distributed transactions are often performed on distributed computing systems. A distributed transaction is a set of operations that update shared objects. Distributed transactions must satisfy the properties of Atomicity, Consistency, Isolation and Durability, known commonly as the ACID properties. According to the Atomicity property, either the transaction successfully executes to completion, and the effects of all operations are recorded, or the transaction fails. The Consistency property requires that the transaction does not violate integrity constraints of the shared objects. The Isolation property requires that intermediate effects of the transaction are not detectable to concurrent transactions. Finally, the Durability property requires that changes to shared objects due to the transaction are permanent.
To ensure the Atomicity property, all participants of the distributed transaction must coordinate their actions so that they either unanimously abort or unanimously commit to the transaction. A two-phase commit protocol is commonly used to ensure Atomicity. Under the two-phase commit protocol, the distributed system performs the commit operation in two phases. In the first phase, commonly known as the prepare phase or request phase, a coordinator node (a node in the distributed computing system managing the transaction) asks all participant nodes whether they are willing to commit to the transaction. During the second phase, commonly known as the commit phase, the coordinator node determines whether the transaction should be completed. If during the prepare phase all participant nodes committed to the transaction, the coordinator node successfully completes the transaction. If during the prepare phase one or more participant nodes failed to commit to the transaction, the coordinator node does not complete the transaction.
Typically, only the coordinator node has all the information necessary to determine whether a transaction should commit or roll back. Therefore, if the coordinator node fails during a distributed transaction, all participants in the transaction must wait for the coordinator to recover before completing the transaction. Thus, significant delays may be caused when a coordinator fails.
To minimize delays caused by a failed coordinator, some conventional transaction systems use clustering and/or group communication protocols to provide standby coordinators. However, clustering protocols and group communication protocols add complexity to distributed transactions, and require a change to the underlying distributed transaction protocol.