Prior art two-phase syncpoint protocols control operations between resources in distributed transaction systems. These protocols are designed to insure that database consistency is maintained even in the face of failures; that is, the protocols insure that transactions either commit or are backed-out by all resources at all nodes of a system.
These protocols are designed to insure that database consistency is maintained even in the face of failures. The sequence of events in a typical two-phase protocol system to perform a syncpoint operation usually takes the following form, simplified for this preliminary discussion. A transaction program at one of the system nodes makes a request to its syncpoint manager to commit or backout a transaction. This syncpoint manager becomes the syncpoint initiator for this transaction. Assuming that the transaction program requested a commit operation, the syncpoint manager transmits a PREPARE TO COMMIT command to each of the resources known to it that is involved in the transaction. These resources include resources, such as a database manager at the syncpoint initiating node, any similar local resources, and resources such as local logical units (LUs) that represent child nodes adjacent to the initiating node that are also involved in the transaction. The PREPARE TO COMMIT command requests these resources to prepare to commit the transaction. The nodes receiving the 14 PREPARE TO COMMIT command perform whatever operations are necessary to respond to the command. For example, updates in temporary buffers may be written out to stable disk storage. Eventually, the syncpoint initiator receives AGREE or NOT AGREE responses from the resources to the PREPARE TO COMMIT commands. These AGREE or NOT AGREE responses represent a composite answer from all other resources at the other nodes of the system involved in the transaction. If all resources agree to commit, the syncpoint initiator then transmits a COMMIT command to each of its resources, which include syncpoint managers in adjacent child nodes; otherwise, it transmits BACKOUT commands. The syncpoint managers receiving a syncpoint command also propagates the command to its known resources, and so on throughout the network. In most prior art systems, the syncpoint managers then collect FORGET messages from each of its resources and propagate a composite FORGET message upward in the network toward the syncpoint initiator. A FORGET message effectively tells a syncpoint manager that a syncpoint operation such as commit was performed satisfactorily by the resources that descend in the system from the syncpoint manager. Assuming that all FORGET messages are collected by an intermediate syncpoint manager in the network and that all indicate that a syncpoint operation was performed successfully, then the typical syncpoint manager informs its transaction program that the syncpoint operation was completed successfully. As a result, the transaction program releases resources which have been locked during the transaction and proceeds with new work. Eventually, the syncpoint initiator receives the composite FORGET messages from its resources and it also releases its transaction program as a result thereof.
When a syncpoint initiator has commanded all other agent resources to commit a transaction, it is still possible for an agent unilaterally to decide to backout its part of the transaction. This usually occurs in situations in which a failure of some type prevents the agent resource from actually completing the commit operation in a timely fashion. While the operation will eventually commit when the failure is cleared, if no unilateral decision intervenes, the time required for this to occur is indefinite. In the meantime, in the typical system, the syncpoint managers at each of the nodes are waiting for FORGET messages from their resources before releasing their transaction programs to begin new work. To prevent the possibility of such intolerable delays at a node that is tied up because of another node or resource, some systems are designed simply never to wait for FORGET messages; such systems assume that a COMMIT will be satisfactorily completed and the transaction programs are released as a matter of course immediately after a decision is made to commit. As an aside, the same problems of heuristic decisions apply to backout syncpoint commands. After sending a PREPARE TO COMMIT, followed by the receipt of a DISAGREE from some resource, and the resultant transmission of a BACKOUT command, there is still the possibility that an unreliable resource will unilaterally commit. The strategy of ignoring the possibility of a unilateral heuristic decision by a resource invites the possibility of database corruption, because it allows transaction programs to begin new work, in spite of the fact that there is always the possibility (due to a heuristic decision) that the transaction may be backed out at one or more nodes, even after a COMMIT command. A heuristic decision is a manual or automatic intervention at a node to force a transaction to commit or back-out at that node, irrespective of the syncpoint operation at other nodes. Such heuristic decisions occur usually because of some condition, such as a failure, that prevents the node or resource from completing a syncpoint in a timely fashion. The heuristic decision is unilateral and, in effect, says `commit` and go on, irrespective of anything else, or `backout` and go on, irrespective of anything else. While a heuristic decision forces a transaction to continue, it can also cause database corruption. Corruption can occur, for example, if a node unilaterally elects to backout, while all other nodes commit, or vice versa, a node unilaterally commits, while all other nodes backout. Database corruption caused by such heuristic decisions refers to damage, such as data inconsistencies across the distributed database, that is not reported to the transaction programs and thus remains in the system indefinitely. Such database damage can be repaired only by application specific repair programs or backup recovery operations.
To summarize the above, to prevent the possibility of database corruption, conventional practice in typical systems requires that syncpoint managers wait until all of its agent resources complete a commit or back-out and communicate this fact to the syncpoint manager. At that time, the syncpoint manager safely releases the transaction program that requested the syncpoint operation. However, as mentioned, this mode of operation limits throughput based on the slowest or most overloaded system nodes. Moreover, on a given transaction, if a failure or excessive delay occurs at one of the nodes, the transaction will not be able to complete at other nodes and resources at those nodes remain locked. Users suffer because their transaction requests backup waiting for resources that are locked on the present transaction. The results of these problems are reduced throughput in normal operation, due to wait time for commit/back-out acknowledgements, and occasional transaction delays ranging from undesirable to intolerable. Some other systems simply ignore the possibility of heuristic decisions at a node and release their transaction programs without waiting for acknowledgements. These systems run the risk of undetected distributed database corruption.