1. Technical Field
The present invention generally relates to reliable distributed computing, and more particularly to transaction processing. Specifically, the present invention relates to a method of insuring that a desired transaction either happens exactly once in entirety or not at all, even when there is a partial system failure while the transaction is processed.
2. Description of Related Art
A desirable feature of a computing system is the ability to recover from partial system failures A partial system failure occurs, for example, when the system "crashes" due to an infrequent software error in the operating system, and the operating system can be restarted. If an application program has a memory write operation in progress at the time of the system failure, it is most likely that the memory record will become erroneous. To enable the recovery of memory records after a partial system failure, it is necessary for the application program to keep backup copies of the records in nonvolatile memory. When the operating system is restarted, the memory records to be recovered are replaced with the backup copies. Then the application program must be restarted to repeat the operations having occurred after the backup copies were made.
To facilitate the making of backup copies and the recovery of memory records, the operating system typically provides an established set of memory management procedures that can be invoked or called from an application program. A typical example is a "recovery unit journaling" feature of Records Management Services (RMS) software sold by Digital Equipment, Corporation, Maynard, MA 01754, for use with the VAX/VMS operating system. To provide for recovery of a memory record, an initial portion of an application program allocates nonvolatile memory for backup copies by invoking an RMS procedure called by the program statement "$SET FILE [FILENAME]/RU.sub.-- JOURNAL" where FILENAME is the name of the file including the memory record to be recovered. To actually make a backup copy and to define the beginning of a "recovery unit," the application program invokes an RMS procedure called by the statement "$START.sub.-- RU". To define the end of a "recovery unit," the application program invokes an RMS procedure called by the statement "$COMMIT.sub.-- RU". If a partial system failure occurs during execution of the "recovery unit," then the memory record is recovered from the backup copy.
A "recovery unit" is comprised of a set of program
statements between the "$START.sub.-- RU" statement and the "$COMMIT.sub.-- RU" statement. All of the statements in the "recovery unit" must be completed before the memory records modified by the statements in the recovery unit are made available for subsequent processing. In other words, the statements in the "recovery unit" specify operations in a single "transaction." The operations in a "transaction" are either all completed at once, or none of them is completed.
The operations in a transaction may modify multiple files in different data bases which could be accessed by multiple processors or nodes in a distributed computing system. In this case, when one processor or node is performing a transaction which modifies a respective set of files, none of the other processors may modify that set of files. Therefore, the application program can ensure internal consistency of the data stored in the files. By defining a group of related operations as a recovery unit, for example, a transfer of funds involving the debiting of a first account and the crediting of a second account, the programmer can ensure that all of the operations in the transaction will be complete before the updated records are made available for further use.
In the event of a partial system failure or other abnormal termination (such as a system "reset") of an application program, the files defined as recoverable will be recovered or updated only to the most recently completed recovery unit, and the data in the files will be consistent. For example, in a transfer of funds application, a system failure will not cause the first account to be debited without the crediting of the second account.
In a typical "transaction processing" system the states of the "objects" manipulated by the transaction are deemed to be whatever is recorded in permanent memory. In a multiprocessor system, for example, a typical method of communication is by shared access to a common permanent memory holding the states of the objects. To restore the state of the system after "crash" of any one of the processors, the memory management procedure for recovery unit journaling performs a well-known "state restoration method."
In the state restoration method, the "START" operation causes "write locks" to be put on the permanent memory records for all of the objects manipulated by the respective transaction. The "START" operation then causes the states of the objects to be saved in the permanent memory records which hold the states of the objects, and the records are copied so that a copy is kept for modification during execution of the recovery unit. Then the transaction defined by the recovery unit is performed upon the copy kept for modification. When the transaction is finished, the "COMMIT" operation causes the states in permanent memory of all of the objects manipulated by the recovery unit to be updated, in one "atomic" step, with the modifications having been made during the transaction. Finally, the write locks on the permanent memory records having been updated are released.
It should be apparent that the implementation of the state restoration method involves only one difficult step, which is the "atomic" step of updating the permanent memory states of all objects manipulated by the recovery unit. By definition, an "atomic" step is a step that is performed in its entirety, or not at all, regardless of a partial system failure or abnormal termination of the application program. Although such an "atomic step" could be performed directly by a specially designed memory unit, it can also be performed indirectly in any conventional computing system by allocating two permanent memory records for every object manipulated during a transaction, and allocating an additional permanent memory location as a flag or switch indicating which permanent memory record holds the permanent state of the object; the other permanent memory record is used whenever the volatile state of the object is written to permanent memory. Therefore, the updating of the permanent memory state of all objects manipulated by the recovery unit can be performed in one atomic step by execution of a single machine instruction that changes the flag or switch for the permanent memory records of the objects modified by the transaction.
In the above implementation, a "partial system failure" is any failure which insures that the single machine instruction is either completed in its entirety or not at all. The "START" operation write locks the respective permanent memory files defined for a respective processor against access by other processors, and saves the respective files by causing any write operations to be performed upon copies in respective permanent memory flagged for use whenever writing to permanent memory. The "COMMIT" operation switches the flag for the respective files, and finally removes the write locks.
As described above, the state restoration method insures that the updates made by any given processor are made consistently, even when a crash may prevent all of the updates from being made at any given node. A more difficult problem is ensuring that after a crash, the state of the system can be automatically restored and processing may continue until completion with the assurance that the transactions interrupted by the crash are completed exactly once. For some transactions, at-least-once semantics is acceptable. For example, a transaction that updates the mailing addresses of newspaper subscribers could be performed more than once without any adverse consequences. In other transactions, however, exactly-once semantics is crucial. In a financial accounting system, for example, a transaction that debits one account and credits another must be performed exactly once for each real-world financial transaction.
Exactly-once semantics has been assured by using procedures such as the "two-phase commit protocol" and its derivations. These procedures are described in J. Eliot B. Moss, Nested Trasactions--An Approach to Reliable Distributed Computing, The MIT Press, Cambridge, Mass., 1985. The "two-phase commit protocol" permits a recovery with exactly-once semantics even though updates to files are performed by a number of different processors in a system. Typically in such a system each transaction is assigned a unique "transaction identification number" and each object is assigned a unique "object identification number" so that the respective operation to be performed or acknowledged for a transaction by any given processor can be signalled by the receipt or transmission of the transaction identification number, and the changes to the state of an object can be communicated along with the object identification number. One processor in the system is assigned the role of a coordinator which initiates a transaction. To begin a transaction, the coordinator sends a prepare command to all of the processors participating in the transaction.
Upon receipt of the "prepare" command, each processor participating in the transaction performs the "START" operation described above, writes the transaction ID into permanent memory to remember that it is prepared for the transaction, and then sends an acknowledgment back to the coordinator processor, but does not yet perform its part of the transaction. The coordinator waits for acknowledgments from all of the participants. When the coordinator receives acknowledgments from all of the participants, the coordinator records in permanent memory a list of the participants and a notation that the transaction is now being completed.
The coordinator then sends "commit" commands to all of the participants. Upon receipt of the "commit" commands, each participant checks its permanent memory for the transaction ID to determined whether it has been prepared for the transaction, and if it has, it performs its part of the transaction, and then performs the "COMMIT" operation described above (which in the process clears the transaction ID from permanent memory when permanent memory is updated), and finally sends an acknowledgment back to the coordinator; if the transaction ID cannot be found in permanent memory, the participant just sends an acknowledgment back to the coordinator. When the coordinator receives acknowledgments from all of the participants, it erases the list of participants from permanent memory, and the transaction is finished.
If a crash occurs during the transaction, then the coordinator may use its list of participants to ensure completion of any transaction that was being completed but which did not finish. "Commit" commands are retransmitted to each of the participants included in the list. Any participant that did not complete its portion of the transaction because of the crash (as indicated by its permanent memory having a record of preparation for the transaction ID) will complete its portion for the first time. Any participant that had already completed its portion of the transaction (as indicated by its permanent memory having no record of preparation for the transaction ID) will not repeat its portion of the transaction. Therefore, the two-phase commit protocol ensures that all portions of an interrupted transaction are performed exactly once when the recovery is finished.