1. Field of the Invention
This invention relates to distributed transaction processing in a client/server environment, and more specifically, to a two-phase commit protocol that is used to ensure data consistency among participants in a transaction.
2. Description of the Related Art
Managing a database of substantial size with substantial functionality typically requires that a database management system (DBMS run on a server machine with significant processing power, working memory, and storage capacity, e.g., the IBM AS 400, Risc System 6000, or a larger machine such as a mainframe, e.g., the IBM System 390. It is not uncommon for a database to be distributed amongst several machines or to require access to several databases running on one or more machines. Transactions against the database(s) can originate from a same machine running a DBMS or another machine, i.e., a client machine, which is connected via a network, e.g., in a distributed or client/server environment, to the DBMS server machine. Such a client machine may be a smaller computer, such as a personal computer, e.g., the IBM PS/2, without any database resources of its own. The client machine can originate transactions in such a distributed or client/server environment. The client interacts, through updates and retrievals, with the data in the database over the network to the DBMS on the server machine.
A transaction (or logical unit of work, LUW) consists of a set of operations that are executed to perform a particular logical task, such as making changes to data resources such as databases or files. A distributed transaction is the execution of one or more statements that access data distributed on different systems. The changes to the resources must be committed or aborted before the next transaction in the series can be initiated. A distributed commit protocol is required to ensure that the effects of a distributed transaction are atomic, i.e., either all the effects of the transaction persist or none persist, whether or not failures occur.
A well-known commit protocol is the two-phase commit protocol. This protocol ensures that all participants commit if and only if all can commit successfully. In a client/server environment, the participants have fixed requester-server roles. Servers initiate no work unless the requester asks for it. Typically, the client is the coordinator, and the participants are the servers.
FIG. 1 shows a time sequence of the two-phase commit protocol for a coordinator 10 with one participant 12. During the first phase, i.e., a voting phase, the coordinator 10 of the commit protocol, asks all of the other participants 12 to prepare to commit 1. Each participant replies whether it can guarantee that it can perform the outcome requested by the coordinator; the reply 3 is "yes" if it can. If a participant is unable to prepare to commit for any reason, the reply is "no." During the second phase, the decision phase, the coordinator propagates the outcome 5 of the transaction to all participants. If all of the participants replied yes during the first phase, the commit outcome is propagated; if any participant replied "no," the abort outcome is propagated. Once the participant either commits or aborts the transaction, the participant 12 sends an acknowledgement 7 back to the coordinator.
During this two-phase commit, each participant writes to a log that it maintains in order to be able to back out any action if the action has to be aborted. A log is a file in non-volatile storage that maintains two-phase state information. The log can be used to figure out how to return resources to consistent states after a failure. The log is read after a failure in order to perform resynchronization. Information that is vital for correct processing after a system failure must be the subject of a forced log write. However, forced writes are not required when the logged information can be recreated after a failure by recovery processing.
Forced write and nonforced write are two ways to write information to nonvolatile storage. A forced write operation does not complete until the information is written to nonvolatile storage. A nonforced write completes when the information is put into volatile storage. Forced write operations take longer to complete, but the information is guaranteed to be available following a failure. Nonforced writes complete more quickly than forced writes, but the information is not guaranteed to be available after a failure, since the failure could have caused the information to be lost while it was still in volatile storage. Since log data still in volatile storage is written to nonvolatile storage when a later log force occurs, completion of a forced write implies that data from all previous nonforced writes have been moved to nonvolatile storage. Nonforced data may also be written to nonvolatile storage when the log manager needs to move log records out of volatile storage for an implementation-defined reason, such as when the volatile buffer pool is full.
Again, with reference to FIG. 1, during the first phase of the two-phase commit protocol, i.e., the voting phase, after a coordinator asks all of the participants to prepare to commit 1, a participant 12 force-writes a prepared log record 2 that ensures that it can successfully commit or abort the transaction, even if a system failure causes it to lose working memory of the transaction. Thus a DBMS acting as a participant forces enough information so that it can either recreate or undo the changes made during the transaction. A client force writes enough information so that it can initiate recovery processing following a failure, information including identity of the coordinator and the state of the two-phase commit operation.
A "yes" vote 3 puts the participant in an in-doubt state, implying that it cannot commit or abort the transaction without an explicit order from the coordinator. If a participant 12 decides to abort the transaction, it force-writes an abort log record and sends a "no" vote to the coordinator. A "no" vote defines the outcome of the transaction so that the participant voting "no" does not have to wait for an explicit order from the coordinator. The participant aborts the transaction, releases all of its locks, and forgets the transaction.
The second phase of the two-phase commit, the decision phase, begins after the coordinator receives all of the votes from the participants. If all the votes received are "yes", the coordinator must decide whether to commit or abort. The coordinator propagates its decision to the participants. If at least one of the votes is "no", the decision to abort can be propagated by the coordinator to only those participants that voted "yes" since the participants voting "no" already know the outcome.
Because the coordinator's decision needs to survive failures, a commit or abort log record is force logged, 4 (FIG. 1), before the decision is propagated to the participants. The completion of the force-write takes the transaction to the committing or aborting state. After receiving the commit or abort decision, each participant moves into the committing/aborting state, force-writes a commit/abort log record 6 to ensure that the transaction will be committed/aborted, and then sends and acknowledgment (ACK) message 7 back to the coordinator indicating that the participant will commit/abort as the coordinator requested. The participant then commits/aborts, and forgets about the transaction. The coordinator, after receiving acknowledgment messages form all participants that voted YES, writes a non-forced END log record 8 and forgets the transaction. The END log record indicates that all participants have successfully completed the commit processing and thus, no recovery processing is required if a failure occurs.
There are many problems that can occur with the above-described two-phase commit protocol that can lead to inefficient and unreliable transaction processing. Some of the problems are a result of the amount of network traffic, including the number of messages involved across the network amongst the participants and the coordinator and the delays in the messages across the network. Other problems arise because of the resource locking during the two-phase commit procedure, the number of log writes, and the failures or power outages at one or more of the participants or within the network. As a result, various techniques have been used to optimize different aspects of the two-phase commit procedure in order to minimize these problems.
There are basically two optimization approaches that have been taken. One approach focuses on improving performance in failure cases by streamlining recovery processing at the expense of extra processing in the normal case. Another approach focuses on improving performance in the normal case because networks and systems are becoming increasingly more reliable and there is a need to support high-volume transactions by having a streamlined protocol for the normal case. See,"Two-Phase Commit Optimizations in a Commercial Distributed Environment", George Samaras, Kathryn Britton, Andrew Citron, C. Mohan, Distributed and Parallel Databases, 3, pages 325-360, Kluwer Academic Publishers, Boston, Manufactured in The Netherlands, 1995. Some examples of streamlining the two-phase commit protocol for the normal case are given below.
For example, the two-phase commit protocol involves network traffic to convey responses between the participants and the coordinator. Any message that is sent over the network slows down the commit protocols since it adds network transit delays. It is known to optimize the two-phase commit protocol by reducing the commit time by reducing the number of messages sent or by sending messages to different participants in parallel.
Another optimization technique is to minimize the number of times a log write is forced. A forced log entry slows down commit processing because the system waits until the entry is written to nonvolatile storage. Minimizing forced log writes and conducting extra recovery processing to regain the lost information is one way to optimize the normal, non-failure case rather than the failure case.
Another optimization technique is called Presumed Abort (PA). Presumed Abort is an extension of the basic two-phase commit protocol, and is now part of the ISO-OSI and X/Open distributed transaction processing standards. Presumed Abort reduces the number of forced log writes and provides optimizations that reduce the number of network message flows. FIG. 2 shows a transaction that aborts 5, followed by a participant failure 18. Unlike the standard two-phase commit, a Presumed Abort two-phase commit does not log before sending the Prepare message. Unlike the standard two-phase commit, a participant does not have to force write an abort record 6 (FIG. 1) before acknowledging 7 (FIG. 1) an abort command. If a prepared record 2 is found on its log after a crash 18, the participant initiates recovery processing with its coordinator 10. Similarly, the coordinator does not have to force write the abort record 4 (FIG. 1). If the coordinator has no information about the transaction on its log, it presumes that the transaction aborted and tells the subordinate to abort 5; hence the name Presumed Abort. The server 12 initiates recovery processing 21 when it finds itself in doubt after a failure. This is necessary since the coordinator may have no memory of the transaction if it also failed. This is different to the two-phase commit coordinator which is responsible for initiating recovery and therefore must force an abort log record 4 (FIG. 1) before sending the abort message to the subordinate. The presumed abort coordinator performs no logging at all in this case, since the participant can initiate recovery. The Presumed Abort includes the read-only and leave-inactive-partners-out optimizations discussed below.
In a "read-only" optimization technique, a participant in a transaction that has not performed any updates is allowed to vote "read-only" in the voting phase. The vote implies that the effects of commit and abort outcomes would be identical for that participant. The participant is left out of the second phase of the commit processing and avoids any log writes. A participant is allowed to vote "read-only" if and only if all of its subordinates have voted read-only; otherwise, it needs to learn the outcome in order to propagate it to subordinates that did not vote read-only. For environments that are dominated by read-only transactions, this optimization provides enormous savings, since it reduces the commit operation to a one-phase commit operation. The Presumed Abort performs no logging at all if all participants vote read-only. FIG. 3 illustrates the read-only optimization with the Presumed Abort protocol since participant B, 14, does not force write a prepared log prior to sending its read-only vote 33.
In environments where servers respond only to requests and do not initiate any work of their own, such as in a client/server environment, commit processing can be optimized by leaving out members that have not participated in a transaction from the two-phase commit protocol. This optimization technique is referred to as "leaving inactive partners out" or "OK-TO-LEAVE-OUT". During a normal two-phase commit operation, the coordinator includes all members, whether or not they have participated in an exchange of data during the transaction. With the OK-TO LEAVE-OUT optimization technique, a coordinator leaves out any member with which it has exchanged no data during the transaction. When a member is left out, the coordinator does not send it the Prepare to Commit messages; nor does it have to wait for the VOTE or ACK replies. This optimization technique is easy to include in the Presumed Abort optimization since the Presumed Abort technique is based on a requester-server, i.e., client/server, model.
Several of the optimization techniques discussed above have limited application, that is, there are certain conditions that must be met before they can become effective. For example, some of the techniques are only useful if a participant is inactive or has performed read-only operations. Also, these techniques may reduce the amount of logging that a participant has to do, but they do not necessarily eliminate logging by a coordinator in all circumstances. For example, in the Presumed Abort technique, the coordinator does not perform any logging 20 (FIG. 2), i.e., does not have to force-write the abort record, only when an abort decision is going to be sent. The coordinator still has to log if a commit decision is sent.
However, in some cases, such as in a client/server environment, a database client/coordinator may not have the resources, or may not provide a secure environment, that allows a log to be maintained. For example, such database clients can reside on personal computers which are not maintained by an administrator and which are not secure. In this environment, the probability of a log being broken, deleted, or unavailable is high. Such local log failures can include, among others, a powering off by the client, a failure of the client's hard disk, network failures between the client and a DBMS server, intermittent outages caused by mobile computing, and accidental erasure of the log. When a two-phase client with a local log fails, database management system (DBMS) servers are exposed to data outages if the current two-phase commit protocol is being used.
If one or more systems involved in a transaction fails and/or a log is deleted or broken during a two-phase commit operation, database integrity is compromised. There can be substantial delays before the operation completes; and the affected resources are not available for use by other transactions. If resynchronization can not be performed, database resources could be locked indefinitely.
In systems having high frequency of transactions, such delays are unacceptable. If this does occur, in some systems, a database administrator is given a way to manually determine the outcome of the commit and to manually force a blocked transaction to commit or rollback. The administrator has to make a choice as to whether to commit or abort the changes to the affected data resources. Once a two-phase commit operation has started, either choice runs the risk of causing heuristic damage, that is, of making the local resources abort when the rest of the transaction commits, or vice versa. This may cause the database resources to be in an inconsistent state. In other systems, such as those where the client is a personal computer, there may not be an administrator to even manually attempt to correct the problem.