1. Field
The present invention relates to a method, transaction manager, transaction processing computer system and computer program for transaction recovery in a multiple transaction manager system and in particular to the prevention of conflict between a recovery transaction manager and another apparently unavailable transaction manager.
2. Description of the Related Art
Transaction processing computer systems in which operations are carried out on resources, for example customer databases, are extremely well known. Such operations are generally implemented using a so-called two phase commit process in which the resources involved are polled by a transaction coordinator (or manager) to see if they are ready to commit to make the changes defined by the transaction. If they are ready, they enter a “prepared” phase and reply in the affirmative. If all resources are prepared, the coordinator issues “commit” instructions, but until the changes to the resources have been made and they have confirmed this to the coordinator, the transaction remains “in-doubt”. In the event that there is a failure in the transaction coordinator while the transaction is in-doubt, the coordinator enters a recovery phase, upon restart, in which it can either complete in-doubt transactions for which it has actually issued a commit instruction or else it can roll back the transaction, meaning that instructions are issued to restore the system to the state it was in immediately prior to the transaction starting. Thus, the transactions may be said to be “atomic” in that changes to resources are either all committed or all rolled back.
In order to be able to complete prepared transactions following a failure, both coordinators and resources log their status during the two phase commit process in transaction recovery logs. These logs are stored persistently and can be referred to as necessary to complete outstanding in-doubt transactions. This activity is referred to as transaction recovery processing and is carried out separately from mainstream forward processing of transactions.
Another aspect of modern transaction processing systems is the need for high availability (HA) under heavy transactional workloads such as may be found in a banking or reservation system. A well known approach to support high availability is to distribute transactions between multiple servers in parallel so as to balance the workload. The servers may be separate computers or separate server instances of a multi-processing computer. Each server acts as a transaction coordinator for transactions routed to it and maintains its own transaction log.
Such a group of servers may be configured as a peer group known as a cluster in which each server is aware of the other servers in the cluster. Should one of the servers fail in the course of a transaction, a high availability system needs to provide a means of rapid recovery in addition to normal attempts to restart the failed server. One such technique is that of peer recovery processing, whereby one of the peer group can be configured to take over and complete the transaction by accessing the failed server's transaction log. To achieve this, transaction recovery logs need to provide shared access to transaction coordinator peers.
For example, if servers A and B perform independent transactional work and server A fails, then in an HA configuration server B should continue its own independent work but may also act as a recovery server for server A's transactions if so directed by a high-availability management component of the server. If the recovery log medium is a logical file (which may consist of a number of physical files) on a shared file system, then server B's HA configuration must allow it to access server A's recovery log.
A well-known complication with this scenario is that peer recovery can be triggered as the result of the occurrence of a partial network partition in which both server A and server B can access the shared file system but can no longer see one another on the network. The high availability management software, when it considers a server to be unavailable, may direct a server peer to recover but often has no real way of knowing that the “failed” server really has failed. Problems will occur if a peer-recovery server takes over a “failed” server's log when the “failed” server is actually still healthy and writing to it. Various hardware techniques occur to prevent this, including redundant networks and “quorum” facilities that switch off the power to any servers that appear to have failed. These techniques rely on instantaneous partition detection and are not safe. For the case of a recovery log hosted in a shared file system, the use of an exclusive lease-based file lock provides a simple solution to this problem, with the exclusive lock determining ownership of the file. Modern file servers, such as those offering the Internet Engineering Task Force's open NFSv4 protocol for distributed file sharing, provide such lease-based locks.
One product that supports such a file system is the IBM WebSphere Application Server Version 6 (“IBM” and “WebSphere” are trademarks of International Business Machines Corporation). A discussion of the problem and its solution by means of exclusive lease-based locking may be found in a paper entitled “Transactional high availability and deployment in WebSphere Application Server V6” by J. Beaven and I. Robinson published on line in the IBM WebSphere Developer Technical Journal on 6 Apr. 2005 at:
http://www.ibm.com/developerworks/websphere/techjournal/0504_beaven/0504_beaven.html.
In general transaction processing, it is known that transaction recovery logs may be stored in a database and that such a database may also hold persistent data on which an application operates within the scope of a transaction. Suitable databases include IBM's DB2 on z/OS or DB2 HADR and Oracle RAC (“IBM”, “DB2” and “z/OS” are trademarks of International Business Machines Corporation; “Oracle” is a trademark of Oracle Corporation). One such system is shown in US Published Patent Application 2008/0250272 A1 to T. E. Barnes et al. entitled “Logging Last Resource System”, assigned to BEA Systems Incorporated.
It might therefore be contemplated that, instead of investing in the separate infrastructure and management of a highly available file system, it would be possible to base high-availability solutions around such databases. Such an arrangement cannot, however, solve the partial network partition problem in a high availability system employing peer recovery in the same manner as with a network file system since, while the database can serialize access to a table representing a transaction recovery log, it does not provide a reliable lease-based mechanism for revoking that lock in the event that an application server really has failed. A database may notice that a remote connection is no longer available if a remote process has failed, but may receive no timely indication from the communications stack in the case of server node outage.
If a recovery peer server is assigned to complete in-doubt work while a “failed” server is actually still healthy, unatomic outcomes may result, as in the following scenario:
Server A is part of a partitioned network. An HA manager believes it to have failed although it has not. Server A is processing a transaction and prepares resources XA1 and XA2, for example. It makes a commit decision which requires a COMMIT record to be written to the log.
Some time previously, server B has been assigned as a peer recovery coordinator for server A and contacts all configured resource managers to obtain a list of their in-doubt transactions. It determines there are resources XA1 and XA2 for which it has no COMMIT record (because Server A hasn't written it to the log yet) and so directs the appropriate resource managers to rollback XA1 and XA2.
Server A then forces its COMMIT record by writing it to the transaction recovery log and then directing both XA1 and XA2 to commit.
This creates a race condition since if server A gets to XA1 first and manages to commit while server B gets to XA2 first and effects a rollback, the outcome will be mixed and thus unatomic.