A distributed transaction is a transaction including operations that are performed by multiple nodes, often implemented at multiple networked computer systems. Distributed transactions arise in a number of different contexts. For example, processing an employee's paycheck may involve a distributed transaction with operations performed by a number of nodes at different individual computer systems. An employer payroll system node may request a transfer from its own account to the account of the employee. A system at the employer's bank may debit the employer's account by the amount of the transfer. A system at the employee's bank may credit the employee's account by the amount of the transfer. In another example, booking travel may involve a distributed transaction. A travel agent system may request reservations on various flights, hotels, etc. In response to the requests, one or more airline systems may book flight tickets for a traveler. One or more hotel systems may create reservations for the traveler, etc.
In distributed transactions, it is important to ensure coordination among the participating computer systems. For example, if one operation fails, then the other operations should be either prevented or reversed. Referring to the paycheck example, if the employer's bank system crashes or otherwise fails to debit the employer's account, then it is desirable to prevent the employee's bank system from completing the transfer to the employee's account. Referring to the travel example, if the travel agent system requests a Monday flight, but the airline indicates that the first available flight is not until Tuesday, then it is desirable to prevent the hotel system from booking a room for Monday night.
A number of existing techniques are used to ensure coordination in distributed transactions. For example, a transaction manager (e.g., a coordinator node) may coordinate a distributed transaction according to an atomic commit protocol, such as a two-phase or three-phase commit protocol. In a first phase, commonly known as a prepare phase, the coordinator node asks all other nodes participating in the transaction whether they will commit to the transaction. Nodes may determine whether they are able to commit to the transaction, for example, by attempting to obtain an appropriate lock for an object or resource to be manipulated in the transaction and attempting to execute their assigned transaction operation. In the paycheck example above, for instance, the employer's bank system may attempt to obtain a lock for a data unit indicating the employer's account and attempt to update the account. Obtaining the lock may prevent other processes from modifying and sometimes from even reading data describing the employer's account until the distributed transaction is complete. If a node is successful in obtaining a necessary lock or locks, and successfully executes the operation, it may commit to the transaction. On the other hand, if a node is unsuccessful in obtaining a necessary lock or locks or fails to execute the operation, it may decline to commit to the transaction. When a node commits to a transaction, the node executes its operation or operations and then continues to hold its lock or locks until the node is instructed by the coordinator to either complete or abort the transaction.
In a subsequent phase of an atomic commit protocol, commonly known as the commit phase, the coordinator node determines whether the transaction should be completed, for example, based on whether any of the nodes failed to commit. For example, if during the prepare phase all participating transaction nodes committed to the transaction, the coordinator node successfully completes the transaction. When the transaction is successfully completed, the participating transaction nodes release their locks. If during the prepare phase one or more participating transaction nodes failed to commit to the transaction, the coordinator node aborts the transaction. When the transaction is aborted, the participating transaction nodes reverse their operations and release their locks.
An atomic commit protocol can be used only when the coordinator node and the various participating transaction nodes are specifically configured. When one or more participating transaction nodes are not configured for an atomic commit protocol, the distributed transaction may be implemented according to a compensation transaction format. According to a compensation transaction, participating transaction nodes execute their assigned operation or operations and release their locks before it is known whether the distributed transaction has succeeded or failed. A compensation action is generated for each participating transaction node. The compensation action reverses the transaction node's assigned operation or operations. If the distributed transaction fails (e.g., if one of the participating transaction nodes cannot execute its assigned operation), then the coordinator node instructs any participating transaction nodes that have already executed operations to reverse the operations by executing their compensation actions.