Computer applications perform operations on data which reflect the effect of business events. For instance, computers are used to keep track of purchases, bank transactions and inventory depletion. Applications which handle business and financial data must provide deterministic results to users who depend on the data presented on terminal screens and in batch reports and for the data stored in persistent databases used to record business events.
Applications typically perform update operations to multiple database entities during the course of a single business transaction. For instance, one business transaction may transfer funds between multiple accounts. In order for the user to experience deterministic and correct behavior, the updates to all affected accounts must be executed atomically; i.e., either all updates occur or none occur. Atomicity is one important aspect of what is referred to as ‘transactional’ behavior. The application will run a local transaction on the host platform in order to perform the multiple updates. The entire transaction either succeeds, resulting in a “commit”, or fails, resulting in a “rollback” to the status before the transaction was invoked.
When these multiple database entities reside on different computing platforms, then the application must execute a transaction on each platform. All of these transactions must cooperate in order to provide the illusion of a “global transaction” which exhibits global atomicity; i.e., either all of the transactions succeed or none of them succeed. This illusion is provided by the use of two-phase commitment protocols.
Two-phase commit protocols require that a single participant in a commitment hierarchy of cooperating transactions serve as the commitment coordinator. This commitment coordinator examines the votes from all the other participants and makes a final decision to either commit or abort the global transaction. All the other participants must abide by the decision of the coordinator in order to be consistent with the rest of the group. This level of cooperation enables transactions which can safely perform operations such as transferring funds between machines.
During the execution of the protocol, all non-coordinator participants provide their vote. If a given local vote is to abort, then the local system can immediately proceed to rollback its part of the global transaction since any vote to abort will prevent the coordinator from making a “commit” decision. However, any participant who votes to “commit” must enter an “in-doubt” state. That participant cannot know the final decision until it is distributed by the coordinator, and it has “agreed” to abide by the final decision. But, while a participant is in the “in-doubt” state, they may experience a local failure; i.e., a crash.
In the event of failure, transaction processing protocols in some highly fault tolerant operating systems allow the transactions on the failing machine, as well as its correspondents, to resort to ‘heuristic decisions’. This means that the recovery phase can be abandoned, and a predetermined decision can be employed to decide whether to commit or rollback those transactions which had been in an “in-doubt” state at the time of failure. While this protocol somewhat relieves pressure from database and system providers, it is not a satisfactory solution for users who, in many cases, must somehow justify discrepancies which may have occurred between the affected database entities. In a sense, heuristic decisions are excellent, but not infallible, guesses which are sometimes applied in applications in which infallibility is highly desirable in order to permit limited access to a database having files affected by unresolved transactions.
Therefore, those skilled in the art will appreciate that it would be highly desirable to provide transaction processing features and mechanisms which cooperate to rebuild the context of transactions in the “in-doubt” state at the time of a system failure so that the protocol driver can complete the recovery protocol and direct the transaction to be either committed or rolled back as determined by the commitment coordinator. Among the functional requirements for such features and mechanisms are that they:
Allow the transaction processing protocol driver to complete recovery under the protocol and resolve “in-doubt” transactions after a system interruption.
Provide a protocol independent implementation solution for the resolution of “in-doubt” transactions after a system interruption.
Provide an administrator with the ability to locate nodes which have unresolved “in-doubt” transactions.
Provide the administrator with the ability to resolve “in-doubt” transactions manually or through heuristics at a delayed time after involved nodes have been restarted.
Prevent access to any file resources which may be required to resolve “in-doubt” transactions.
And, most directly relevant to the present invention:
Provide for recovery from multiple system interruptions during the resolution of “in-doubt” transactions which may occur at various points of a recovery; and
Allow access as soon as possible to all file resources which are not required to resolve “in-doubt” transactions.