The invention relates generally to a clustered processing system formed from multiple processor units with fault-tolerant capability. More particularly, the invention relates to a method, and apparatus for implementing that method, for handling, in a fault-tolerant manner, the failure of a resource manager in the context of a transaction executing on the system.
A useful definition of a transaction is that it is an explicitly delimited operation, or set of related operations, that change or otherwise modify the content of an information collection or database from one consistent state to another. Changes are treated as a single unit in that all changes of a transaction are formed and made permanent (i.e., the transaction is xe2x80x9ccommittedxe2x80x9d) or none of the changes are made permanent (i.e., the transaction is xe2x80x9cabortedxe2x80x9d). If a failure occurs during the execution of a transaction, the transaction can be aborted and whatever partial changes were made to the collection can be undone to leave it in a consistent state.
Typically, transactions are performed under the supervision of a transaction manager facility (TMF). In geographically distributed systems, such as multiple processor unit systems or xe2x80x9cclustersxe2x80x9d (i.e., a group of independent processor units managed as a single system), the TMF is xe2x80x9cdistributedxe2x80x9d in the sense that each processor unit will have its own TMF component to coordinate operations of a transaction conducted on that processor unit. The processor unit at which (or on which) a transaction begins is sometimes called the xe2x80x9cbeginnerxe2x80x9d processor unit, and the TMF component of that processor unit will operate to coordinate those transactional resources remote from its resident processor unit (i.e., resources managed by other processor units). Those TMF components running on processor units managing resources enlisted in a transaction are xe2x80x9cparticipantsxe2x80x9d in the transaction. And, it is the TMF component of the beginner processor unit that initiates the steps taken.
A preferred approach to concluding the transaction, and confirming that all participant resources employed in a transaction are able to participate in that conclusion, is to use the Two-Phase Commit (xe2x80x9c2PCxe2x80x9d) protocol. According to this approach the beginner TMF component, upon receipt of an xe2x80x9cEnd Transactionxe2x80x9d request from the application process that requested the transaction, will broadcast a xe2x80x9cPreparexe2x80x9d signal to all processor units of the cluster. The processor units, upon receipt of the Prepare signal, will notify their (local) participant resources to perform as necessary (e.g., completing writes to disk storage, clearing memory, etc.) for effecting the change in state of the database and, if the necessary operation succeeds, respond with a xe2x80x9cReadyxe2x80x9d signal. If all participants of the transaction respond with an affirmative, i.e., a xe2x80x9cReadyxe2x80x9d signal (and xe2x80x9cNot Involvedxe2x80x9d signals received from any processor units not participating in the transaction), the beginner TMF component will notify a transaction monitor process (TMP), running on one of the processor units, to xe2x80x9ccommitxe2x80x9d the change to an audit log. The TMP will tell the beginner TMF component that the transaction is committed, and the beginner TMF component will then broadcast a xe2x80x9cCommitxe2x80x9d signal to the participant processor units. At this point the change is considered permanent.
Fault tolerance is another important feature of transaction processing. Being able to detect and tolerate faults allows the integrity of the collection being managed by the system to be protected. Although a number of different methods and facilities exist, one particularly effective fault tolerant technique is the xe2x80x9cprocess-pairxe2x80x9d technique as it is sometimes called. (This technique is also sometimes referred to as xe2x80x9cfail-overxe2x80x9d capability.) According to this technique, an application program is instantiated as two separate processes, a primary process resident on one processor unit of the cluster, and a backup process resident on another processor unit. If the primary process, or the processor unit upon which it is running, fails, that failure will bring into operation the backup process to take over the operation of the lost (primary) process. If that failure occurs during a transaction in which the lost process was a participant, the backup will decide whether or not to notify the beginner processor unit to abort the transaction and begin over again. In this way the state of the collection managed by the system remains consistent. An example of the process-pair or fail-over technique can be found in U.S. Pat. No. 4,817,091.
An alternative approach, one used for example by the software applications that use object linking and embedding (OLE), is to create a backup process only after the primary process is detected as having failed. The state needed by the newly-created backup is transferred after creation. One problem with this approach is that the state needed by the backup is often retained by the node or processor unit on which the primary was running. If it happens that the primary process has failed because the processor unit on which it was running failed, or it has lost the capability to communicate with the transaction manager, that state can be lost.
Also, there are times when the failure of a process, and the subsequent fail-over of the failed process to another processor unit (i.e., to the backup process), tend to impede transactions. For example, a stage in a transaction may be reached such that the participants no longer are able to abort the transaction. Should a participant process, or the processor unit, or some other facility related to the participant process, fail, the transaction will not be committed, and state used by the failed process will be left to clutter the system.
These problems normally do not occur in a coordinated system having component parts designed to work together. They most often appear when porting an application from one platform to another.
Accordingly, it can be seen that there exists a need for being able to provide full fail-over capability in a transaction processing system in order to maintain fault-tolerance. It follows, therefore, that the state created and maintained by the primary process should be placed where it can be reached for use by the backup process when necessary, regardless of how the primary process fails or is lost.
According to the present invention, in a transaction processing system using a transaction management facility (TMF) a certain state will be written to the audit log maintained by the TMF when a point (using a two-phase or 2PC commit operation) in a transaction is reached beyond which participation of the resources used in the transaction will be required. Typically, the point is when the Ready signal is received in response to the Prepare signal broadcast by the beginner TMF. According to the invention, the Ready signal is accompanied by state information from which the state needed by that participant can be recreated, and written to an audit log by the beginner TMF. If the participant fails, or is otherwise made unavailable, a backup participant will be createdxe2x80x94preferably on another nodexe2x80x94and provided with the same identifier of the now-failed participant. The backup participant will query TMF to determine if any transactions are outstanding and associated with the identifier. Responding, the TMF will supply the backup participant with the retained state information previously stored in the audit log, thereby allowing the backup participant to complete as necessary the transaction previously involving the failed participant.
A significant advantage of the invention is to allow OLE compliant applications, such as the Microsoft SQL Server (e.g., the Microsoft SQL Server 6.5) or the Microsoft Message Queue Server, to be ported to a foreign platform and yet keep their fault tolerant capability which relies upon detection of a failure of a process before a backup of that process is created.