Distributed transactions are often performed on distributed computing systems. Herein, a distributed computing system that performs distributed transactions is referred to as a distributed transaction system. A distributed transaction is a set of operations that update shared objects. Distributed transactions must satisfy the properties of Atomicity, Consistency, Isolation and Durability, known commonly as the ACID properties. According to the Atomicity property, either the transaction successfully executes to completion, and the effects of all operations are recorded, or the transaction fails. The Consistency property requires that the transaction does not violate integrity constraints of the shared objects. The Isolation property requires that intermediate effects of the transaction are not detectable to concurrent transactions. Finally, the Durability property requires that changes to shared objects due to the transaction are permanent.
To ensure the Atomicity property, all participants of the distributed transaction must coordinate their actions so that they either unanimously abort or unanimously commit to the transaction. A two-phase commit protocol is commonly used to ensure Atomicity. Under the two-phase commit protocol, the distributed system performs the commit operation in two phases. In the first phase, commonly known as the prepare phase or request phase, a coordinator node (a node in the distributed transaction system managing the transaction) asks all participant nodes whether they are willing to commit to the transaction. During the second phase, commonly known as the commit phase, the coordinator node determines whether the transaction should be completed. If during the prepare phase all participant nodes committed to the transaction, the coordinator node successfully completes the transaction. If during the prepare phase one or more participant nodes failed to commit to the transaction, the coordinator node does not complete the transaction.
To accurately track distributed transactions, participants and coordinators log the distributed transactions. If a coordinator or participant fails during a transaction, the log provides a record that the participant node was involved in the transaction.
A distributed transaction system generally includes a recovery subsystem. The recovery subsystem includes multiple recovery managers, each of which scans logs of participants and coordinators that operate on a server on which the recovery manager runs. Whenever the recovery manager identifies an unresolved transaction in a participant's transaction log, it sends a message to a coordinator of the transaction to inquire as to whether the transaction was aborted or committed. It can then direct the participant to commit or abort the transaction based on a response that it receives from the coordinator. The recovery manager performs such recovery operations in a sequential manner. Therefore, it may separately find and resolve the same unresolved transaction for multiple participants, one at a time. For each participant, a separate message is sent to the coordinator, and a separate response is received.
Whenever the recovery manager identifies an unresolved transaction in a coordinator's transaction log, it sends a separate message to each participant of the transaction to resolve the transaction. This is true even if multiple participants operate on the same server. Thus, recovery of a single transaction often includes multiple messages passed between a coordinator and participants of the transaction.