In transaction processing systems, accesses and updates to system resources are typically carried out by the execution of discrete transactions(or units of work). A transaction is a sequence of coordinated operations on system resources such that either all of the changes take effect or none of them does. These operations are typically changes made to data held in storage in the transaction processing system; system resources include databases data tables, files, data records and so on. This characteristic of a transation being accomplised as a whole or not at all in also known as atomicity.
In this way, resources are prevented from becoming inconsistent with each other. If one of the set of updata operations fails then the others must also not take effect. A transaction then transforms a consistent state of resources into another consistent state, without necessarily preserving consistency at all intermediate points.
The atomic nature of transactions is maintained by means of a transaction synchronization procedure commonly called a commit procedure. Logical points of consistency at which resource changes are synchronized within transaction execution are called commit points or syncpoint; an application ends a unit of work by declaring a syncpoint, or by the application terminating.
Atomicity of a transaction is achieved by resource updates made within the transaction being held in-doubt (uncommitted) until a syncpoint is declared at completion of the transaction. If the transaction succeeds, the results of the transaction are made permanent (committed); if the transaction fails, all effects of the unsuccessful transaction are removed (backed out), and the resources are restored to the consistent state which existed before the transaction began.
There are a number of different transaction processing systems commercially available; an example of an on-line transaction processing system is the CICS system developed by International Business Machines Corporation (IBM is a registered trademark and CICS is a trademark of International Business Machines Corporation).
In a transaction data processing system which includes either a single site or node where transaction operations are executed or which permits such operations to be executed at only one node during any transaction, atomicity is enforced by a single-phase synchronization operation. In this regard, when the transaction is completed, the node, in a single phase, either commits to make the changes permanent or backs out the changes.
In distributed systems encompassing a multiplicity of nodes, a transaction may cause changes to be made to more than one of such nodes. In such a system, atomicity can be guaranteed only if all of the nodes involved in the transaction agree on its outcome. A simple example is a financial application to carry out a funds transfer from one account to another account in a different bank, thus involving two basic operations to critical resources: the debit of one account and the credit of the other. It is important to ensure that either both or neither of these operations take effect.
Distributed systems typically use a transaction synchronization procedure called two-phase commit (2PC) protocol to guarantee atomicity. In this regard, assume that a transaction ends successfully at an execution node and that all site resource managers (or agents) are requested to commit operations involved in the transaction. In the first phase of the protocol (prepare phase), all involved agents are requested to prepare to commit. In response, the agents individually decide, based upon local conditions, whether to commit or back out their operations. The decisions are communicated to a synchronization location, called coordinator, where the votes are counted. In the second phase (commit phase), if all agents vote to commit, a request to commit is issued, in response to which all of the agents commit their operations. On the other hand, if any agent votes to back out its operation, all agents are instructed to back out their operations.
Distributed systems are organized in order to be largely recoverable from system failures, either communication failures or node failures. A communication failure and a failure in a remote node generally manifest themselves by the cessation of messages to one or more nodes. Each node affected by the failure can detect it by various mechanisms, including a timer in the node which detects when a unit of work has been active for longer than a preset maximum time. A node failure is typically due to a software failure requiring restarting of the node or a deadlock involving preemption of the transaction running on the node.
System failures are managed by a recovery procedure requiring resynchronization of the nodes involved in the unit of work. Since a node failure normally results in the loss of information in volatile storage, any node that becomes involved in a unit of work must write state changes (checkpoints) to non-volatile storage synchronously with the transmission of messages during the two-phase commit protocol; these checkpoint data, or log, written to a stable storage medium as the protocol proceeds allow the same protocol to be restarted in the case of a failure of the node. Such writing to the stable storage medium may be synchronous or asynchronous. A synchronous write occurs when state changes (checkpoints) are written to non-volatile storage synchronously with the transmission of messages during the two-phase commit protocol. An asynchronous write occurs when state changes (checkpoints) are written to non-volatile storage prior to the transmission of messages during the two-phase commit protocol, such that the protocol does not have to wait until the completion of such data being written.
The IBM System Network Architecture or IBM SNA LU 6.2 syncpoint architecture developed by International Business Machines Corporation is known to coordinate commits between two or more protected resources. The LU 6.2 architecture supports a syncpoint manager (SPM) which is responsible for resource coordination, syncpoint logging and recovery.
A problem with known protocols for two-phase commit across networks is that they do not cater adequately for the case where sites act as routing nodes which distribute work to other parts of the system, with no resources of their own that require the property of atomicity. In the protocols known in the art, any node (including routing nodes) writes log data synchronously with the message transmission. These checkpoints involve a substantial delay in message transmission, because of the time required to save data to non-volatile storage; the protocol can only proceed after the writing has been performed hence greatly extending the time taken by it. This is unnecessary if, as for a routing node, no updates are made to resources which are dependent on the atomic properties which the two-phase commit protocol provides. Instead, the protocol need only ensure that end nodes which communicate through the routing node can contact each other in the case of a system failure.
An optimization of the two-phase commit protocol is described in "Open Commit Protocols Tolerating Commission Failures", Kurt Rothermel and Stefan Pappe, ACM Transactions on Database Systems, Vol. 18 Number 2, June 1993; this document is mainly addressed to systems including a disparate collection of nodes, some of which may be informally supported and without rigorous operating procedures. A protocol is described which tolerates the complete loss of certain nodes in the system, without losing coordination of the remaining nodes. The protocol requires the addition of a method for determining which nodes are trusted, a method for transmitting to each node the identity of the coordinator and means for a node to make contact with the coordinator following a failure, even if it had not originally been in contact during the syncpoint conversation. It should be noted that this protocol requires the identity of the coordinator to be recorded on non-volatile storage if there is any prospect of a node needing to start resynchronization protocols. In addition, this protocol is not immediately practical in its requirements for changes to a well established implementation.