Distributed transactions are often performed on distributed computing systems. A distributed transaction is a set of operations that update shared objects. Distributed transactions must satisfy the properties of Atomicity, Consistency, Isolation and Durability, known commonly as the ACID properties. According to the Atomicity property, either the transaction successfully executes to completion, and the effects of all operations are recorded, or the transaction fails. The Consistency property requires that the transaction does not violate integrity constraints of the shared objects. The Isolation property requires that intermediate effects of the transaction are not detectable to concurrent transactions. Finally, the Durability property requires that changes to shared objects due to the transaction are permanent.
To ensure the Atomicity property, all participants of the distributed transaction must coordinate their actions so that they either unanimously abort or unanimously commit to the transaction. A two-phase commit protocol is commonly used to ensure Atomicity. Under the two-phase commit protocol, the distributed system performs the commit operation in two phases. In the first phase, commonly known as the prepare phase or request phase, a coordinator node (a node in the distributed computing system managing the transaction) asks all participant nodes whether they are willing to commit to the transaction. During the second phase, commonly known as the commit phase, the coordinator node determines whether the transaction should be completed. If during the prepare phase all participant nodes committed to the transaction, the coordinator node successfully completes the transaction. If during the prepare phase one or more participant nodes failed to commit to the transaction, the coordinator node does not complete the transaction.
To accurately track distributed transactions, participants and coordinators log the distributed transactions. However, the number of distributed transactions that a coordinator or participant may participate in can be very high. Therefore, even though each log entry may not require a large amount of storage space, in the aggregate the logs can require enormous amounts of storage capacity. Therefore, once a distributed transaction is successfully completed, the log pertaining to the distributed transaction is typically deleted.
The two-phase commit protocol, although widely used, introduces substantial delay in transaction processing. This is because writing each log entry, in addition to consuming storage capacity, also consumes other resources (e.g., processor time, memory, network bandwidth, etc.). The basic two-phase commit protocol (often referred to as presumed nothing optimization) requires information to be explicitly exchanged and logged whether the transaction is to be committed or aborted (rolled back). Writing each entry to the log introduces overhead. Therefore, for a system that conducts many distributed transactions the presumed nothing optimization is undesirable. This applies even if logs for successful transactions are deleted.
Accordingly, a number of optimizations may be used to reduce the number of entries written to the log. A first optimization that may be used with the two-phase commit distributed transaction is the presumed abort optimization, which is implemented in almost all distributed transactional systems. According to the presumed abort optimization the coordinator only writes to the log when it decides to commit (to issue a command to all participants to commit). If there is a failure, or the coordinator decides to roll back, it never writes to the log. Therefore, in the absence of information about a transaction, the coordinator presumes that the transaction has been aborted. Another optimization is the presumed commit optimization. According to the presumed commit optimization, in the absence of information about a transaction the coordinator presumes that the transaction was successfully completed.
In the presumed commit, presumed abort and presumed nothing optimizations, there is a small window of vulnerability between the time when a participant agrees to commit to a message and the time when the participant receives a response indicating whether the participant should roll back or commit to the transaction. If the participant crashes within this window, then there will be no record of whether or not the transaction was successfully completed (because the coordinator deletes its log upon completion of a transaction). Therefore, if the presumed abort optimization is used, the participant will be informed that the transaction was rolled back (even if the transaction was successfully completed). If the presumed commit optimization is used, the participant will be informed that the transaction was successfully completed (even if the transaction was rolled back). If the presumed nothing optimization is used, the participant will be informed that it is unknown what the status of the transaction is.