This invention relates to operation of distributed computing systems, and more particularly to a commit protocol for a distributed transaction processing system.
Computer systems using transaction processing employ a commit protocol to insure that no permanent change is made in a data item, or no change visible to other nodes of the system, until a specified "commit" is executed. One of these protocols commonly used in transaction processing is the so-called "two phase commit" or "2PC" protocol, described in detail by Mohan & Lindsay, "Efficient Commit Protocols for the Tree of Processes Model of Distributed Transactions," Proc. 2nd ACM SIGACT/SIGOPS Symposium on Principles of Distributed Computing, Aug. 17, 1983. The two-phase commit protocol can be of the "presumed-abort" or "presumed-commit" types. The presumed-abort two-phase commit protocol is commonly used in current transaction processing systems to coordinate the commitment of distributed transactions, instead of the presumed-commit two-phase commit protocol. Nevertheless, the presumed-commit protocol has clear advantages in many situations, because each subordinate in committed transactions does not need to send a final acknowledge to the coordinator in response to the commit message, but must acknowledge aborts. With presumed-abort, this final acknowledge is not needed for abort messages from the coordinator, but is needed for commit messages. Transactions commit far more frequently than they abort; hence, the presumed-commit protocol saves this final acknowledge much more frequently than the presumed-abort variant. It is thus desirable to eliminate the current presumed-commit liability, as will be described, so as to realize the performance improvement of the presumed-commit protocol.
The presumed-abort protocol has been chosen instead of presumed-commit in prior work because of the activity required of the commit coordinator. With the presumed-abort protocol, whenever a subordinate (also called a cohort) inquires of the coordinator process what the status of a transaction is, if the coordinator has a record of it, then the transaction is committed. Otherwise, in the absence of information, the coordinator process indicates that the transaction is aborted. This means that the coordinator process need not make information stable (write to disk storage) until a transaction commits, since any earlier crash will be presumed to have aborted, and this is in line with what happened. The coordinator eventually writes a (non-forced) end-transaction record when all cohorts have acknowledge the final message. This permits the coordinator to garbage collect its remembered state, and to have that garbage collection information persist across system crashes.
For presumed-commit protocol, the coordinator process needs to explicitly know which transactions have aborted. Traditionally, this has meant that, at the time that the two-phase commit protocol is initiated, the coordinator forces to the log the fact that the transaction has not successfully committed, it has a log record for protocol start, but none indicating completion. These incomplete protocol transactions are added to the list of aborted transactions. Further, the protocol start log record permits the garbage collection of the abort list information; that is held in volatile memory. At the time that all expected acknowledges have been received, the coordinator knows that no further inquiries will be received. Hence the abort information for the transaction can be discarded. The end-transaction record indicates this stably.
It is the coordinator's forcing of a log record at the start of the two-phase commit protocol that is an added expense. This extra forced write is incurred for every transaction that is completing via the two-phase commit protocol. So, it is the objective of the present invention to eliminate this extra log force, preserving the benefits of the presumed-commit form of the two-phase commit protocol.