The invention relates to the field of client/server (also known as xe2x80x9cdistributedxe2x80x9d) computing, where one computing device (xe2x80x9cthe clientxe2x80x9d) requests another computing device (xe2x80x9cthe serverxe2x80x9d) to perform part of the client""s work. The client and server can also be both located on the same physical computing device.
Client/server computing has become more and more important over the past few years in the information technology world. This type of distributed computing allows one machine to delegate some of its work to another machine that might be, for example, better suited to perform that work. For example, the server could be a high-powered computer running a database program managing the storage of a vast amount of data, while the client is simply a desktop personal computer (PC) which requests information from the database to use in one of its local programs.
The benefits of client/server computing have been even further enhanced by the use of a well-known computer programming technology called object-oriented programming (OOP), which allows the client and server to be located on different (heterogeneous) xe2x80x9cplatformsxe2x80x9d. A platform is a combination of the specific hardware/software/operating system/communication protocol which a machine uses to do its work. OOP allows the client application program and server application program to operate on their own platforms without worrying how the client application""s work requests will be communicated and accepted by the server application. Likewise, the server application does not have to worry about how the OOP system will receive, translate and send the server application""s processing results back to the requesting client application.
Details of how OOP techniques have been integrated with heterogeneous client/server systems are explained in U.S. Pat. No. 5,440,744 and European Patent Published Application No. EP 0 677,943 A2. These latter two publications are hereby incorporated by reference. However, an example of the basic architecture will be given below for contextual understanding of the invention""s environment.
As shown in FIG. 1, the client computer 10 (which could, for example, be a personal computer having the IBM OS/2 operating system installed thereon) has an application program 40 running on its operating system (xe2x80x9cIBMxe2x80x9d and xe2x80x9cOS/2xe2x80x9d are trademarks of the International Business Machines corporation). The application program 40 will periodically require work to be performed on the server computer 20 and/or data to be returned from the server 20 for subsequent use by the application program 40. The server computer 20 can be, for example, a high-powered mainframe computer running on IBM""s MVS operating system (xe2x80x9cMVSxe2x80x9d is also a trademark of the IBM corp.). For the purposes of the present invention it is irrelevant whether the requests for communications services to be carried out by the server are instigated by user interaction with the first application program 40, or whether the application program 40 operates independently of user interaction and makes the requests automatically during the running of the program.
When the client computer 10 wishes to make a request for the server computer 20""s services, the first application program 40 informs the first logic means 50 of the service required. It may for example do this by sending the first logic means the name of a remote procedure along with a list of input and output parameters. The first logic means 50 then handles the task of establishing the necessary communications with the second computer 20 with reference to definitions of the available communications services stored in the storage device 60. All the possible services are defined as a cohesive framework of object classes 70, these classes being derived from a single object class. Defining the services in this way gives rise to a great number of advantages in terms of performance and reusability.
To establish the necessary communication with the server 20, the first logic means 50 determines which object class in the framework needs to be used, and then creates an instance of that object at the server, a message being sent to that object so as to cause that object to invoke one of its methods. This gives rise to the establishment of the connection with the server computer 20 via the connection means 80, and the subsequent sending of a request to the second logic means 90.
The second logic means 90 then passes the request on to the second application program 100 (hereafter called the service application) running on the server computer 20 so that the service application 100 can perform the specific task required by that request, such as running a data retrieval procedure. Once this task has been completed the service application may need to send results back to the first computer 10. The server application 100 interacts with the second logic means 90 during the performance of the requested tasks and when results are to be sent back to the first computer 10. The second logic means 90 establishes instances of objects, and invokes appropriate methods of those objects, as and when required by the server application 100, the object instances being created from the cohesive framework of object classes stored in the storage device 110.
Using the above technique, the client application program 40 is not exposed to the communications architecture. Further the service application 100 is invoked through the standard mechanism for its environment; it does not know that it is being invoked remotely.
The Object Management Group (OMG) is an international consortium of organizations involved in various aspects of client/server computing on heterogeneous platforms with distributed objects as is shown in FIG. 1. The OMG has set forth published standards by which client computers (e.g. 10) communicate (in OOP form) with server machines (e.g. 20). As part of these standards, an Object Request Broker (called CORBAxe2x80x94the Common Object Request Broker Architecture) has been defined, which provides the object-oriented bridge between the client and the server machines. The ORB decouples the client and server applications from the object oriented implementation details, performing at least part of the work of the first and second logic means 50 and 90 as well as the connection means 80.
As part of the CORBA software structure, the OMG has set forth standards related to xe2x80x9ctransactionsxe2x80x9d and these standards are known as the OTS or Object Transaction Service. See, e.g., CORBA Object Transaction Service Specification 1.0, OMG Document 94.8.4. Computer implemented transaction processing systems are used for critical business tasks in a number of industries. A transaction defines a single unit of work that must either be fully completed or fully purged without action. For example, in the case of a bank automated teller machine from which a customer seeks to withdraw money, the actions of issuing the money, reducing the balance of money on hand in the machine and reducing the customer""s bank balance must all occur or none of them must occur. Failure of one of the subordinate actions would lead to inconsistency between the records and the actual occurrences.
Distributed transaction processing involves a transaction that affects resources at more than one physical or logical location. In the above example, a transaction affects resources managed at the local automated teller device as well as bank balances managed by a bank""s main computer. Such transactions involve one particular client computer (e.g, 10) communicating with one particular server computer (e.g., 20) over a series of client requests which are processed by the server. The OMG""s OTS is responsible for co-ordinating these distributed transactions.
Usually, an application running on a client process begins a transaction which may involve calling a plurality of different servers, each of which will initiate a server process to make changes to its local database according to the instructions contained in the transaction. The transaction finishes by either committing the transaction (and thus all servers finalize the changes to their local databases) or aborting the transaction (and thus all servers xe2x80x9crollbackxe2x80x9d or ignore the changes to their local databases). To communicate with the servers during the transaction (e.g., instructing them to either commit or abort their part in the transaction) one of the processes involved must maintain state data for the transaction. This usually involves the process to set up a series of transaction state objects, one of which is a Coordinator object which coordinates the transaction with respect to the various server processes.
The basic software architecture involved in providing an implementation of the OTS is shown in FIG. 2. A client process 21 which wants to begin a transaction (e.g., to withdraw money from a bank account) locates a process which is capable of creating and holding the transaction objects that will maintain the state of the transaction. As the modern tendency is to create clients that are xe2x80x9cthinxe2x80x9d (and thus have only the minimum functionality), the client process 21 will usually not be able to maintain the transaction objects locally and must look for a server process for this purpose.
The OTS (or another service, such as the CORBA Lifecycle service) selects server A process 22 on which to create the transaction state objects 221 (which include the Coordinator object, Control object and Terminator object). Upon locating the server A process 22, client process 21 sends (arrow with encircled number 1) a message to server A process 22 to instruct server A process 22 to create the transaction state objects 221. The Control object (known in the OTS as CosTransactions::Control) provides access to the other two transaction state objects. The Terminator object (known in the OTS as CosTransactions::Terminator) is used to end the transaction. The Coordinator object (known in the OTS as CosTransactions::Coordinator) maintains a list, in local storage 222, of resource objects (known in the OTS as CosTransactions::Resource) that have made updates to their respective data during the transaction. This list is required so that the Coordinator object can consistently call the resource objects at the end of the transaction to command them to commit their transactional changes (make their local data changes final) or to rollback such changes (bring the local data back to the state it was in before the transaction started). A rollback would be necessary, for example, where the transaction could not finish because one of the resources was not working properly.
Server A process 22 then creates the transaction state objects 221 and sends a reply (arrow with encircled number 2) containing the transaction context to client 21. Client 21 then sends, for example, a debit bank account command (arrow with encircled number 3) to server B process 23 (the process containing the resource, for example, bank account, object 231 which the client process 21 wishes to withdraw money from). This latter command carries with it the transaction context supplied to the client 21 by the server A process 22. In this way, the resource object 231 in process 23 can register itself (arrow with encircled number 4) with the transaction objects 221 in process 22 so that the resource object 231 can be commanded (arrow with encircled number 5) to commit or rollback by the transaction state objects 221 at the end of the transaction.
In the above operation, when the transaction state objects 221 are created, they must log information about themselves and the transaction they represent in local storage 222, so that the transaction will be recoverable in case of a server failure which temporarily prevents the server A process 22 from continuing with the transaction.
As part of the transaction, the client process 21 then makes similar calls to server C process 24 (to access the resource object 241) and server D process 24 (to access the resource object 251). Server B process 23, in carrying out its part of the transaction, may need to call another server process, such as server E process 26, to access the resource objects 261, 262 and 263 located in process 26.
Since the number of server processes and resources involved in FIG. 2 is becoming large, the need for careful synchronization of all of the database changes involved becomes readily apparent. The usual way to go about achieving this synchronization is to carry out a two-phase commit process when the client 21 issues a command to end the transaction. The transaction objects 221 first command (phase 1) each of their directly registered resources (231, 241 and 251 in the FIG. 2 example) to prepare to commit their database changes. Phase 1 is also known as the prepare stage of a transaction, as the resources are being prepared for the finalization of their data changes, which will take place in phase 2. Each of these resources then responds to the transaction objects 221 to indicate that it has prepared to commit its changes, and the resources will not allow any more changes to be made to the databases. This response is, in effect, a vote, signifying that this particular resource is voting that the transaction should be committed. After issuing their votes, the resources are then said to be sitting xe2x80x9cin doubtxe2x80x9d (also known as in a xe2x80x9cpreparedxe2x80x9d state) waiting for the transaction objects 221 to give a synchronized final command (phase 2) to commit all database changes made during the transaction. This latter final command is only given if all resources have voted that the transaction should be committed. Server B process 23, which has called another server process 26, would carry out its own two-phase commit protocol with respect to the resource objects 261, 262, and 263, as part of its participation in the main two-phase commit protocol discussed above. That is, server B process 23 would send a prepare command to its directly registered resources 261, 262 and 263, and receive a vote from each of them, before server B process 23 sends a consolidated reply to server A process 22.
Rather than voting that a transaction be committed, a resource can also vote that a transaction should be rolled back. A rollback vote would be issued by a resource if that resource had a problem while making its data changes during the transaction (e.g., some type of write error had occurred while a resource was making a local data change). The receipt of a rollback vote from at least one resource will cause the transaction objects 221 to rollback the entire transaction. This is in keeping with the fact that a transaction is an all or nothing prospect: either all resource changes in a transaction are committed or none are.
The newer version of the CORBA specification, CORBA Services: Common Object Services Specification Transaction Service v1.1, December 1996, introduces two new responses that a resource can provide to the transaction objects 221 during the prepare stage. These two new responses are not votes but are instead called xe2x80x9cexceptionsxe2x80x9d. When a resource issues an exception in place of a vote, the resource is said to have xe2x80x9cthrown an exceptionxe2x80x9d. The first exception is:
CosTransactions::HeuristicHazard
The second exception is:
CosTransactions::HeuristicMixed
The nature of these two exceptions will now be described.
While the transaction outcome is unknown, resource objects are holding locks to their respective database resources, so no other transaction can access such resources until the two-phase commit protocol is completed in the server process where the resources reside. Should this take an unreasonably long time to complete for some reason, the locked data will be unavailable for a long time. As locking data for an extended period of time can often work a severe hardship in many servers (due to the fact that no other transaction can have access to the server resources), a systems administrator of a server holding resource objects is often given the authority to make a guess (called an heuristic decision) as to how the transaction will turn out, should the server be left incomplete for a long period of time. This way, the resource can release its locks before the second phase of the two-phase commit has completed so that another transaction can have access to the resource""s associated data. If the guess turns out to be right, then everything is fine, however, if the guess turns out to be wrong, then a condition known as heuristic damage has occurred.
For example, assume that resource objects 262 and 263 have taken heuristic decisions to rollback a transaction and resource object 261 has taken an heuristic decision to commit the transaction because the transaction has been running for an unreasonably long period of time. This situation causes resource object 231 to throw the exception HeuristicMixed, since some of resource object""s subordinate resource objects (i.e., 262 and 263) have heuristically rolled back, while another (i.e., 261) has heuristically committed.
Resource object 231 would instead throw the exception HeuristicHazard back to the transaction objects 221 in a situation where the resource object 231 does not know what its subordinates (i.e., 261, 262 and 263) have done, because they are not answering when the resource object 231 calls them to determine their status. In this case, the server 26 containing these resources may have gone down (as a result of, for example, a thunderstorm).
When the transaction objects 221 in the server A process 22 receives one of these exceptions from their directly registered resource object 231 during the prepare stage, it is now clear that the transaction cannot proceed successfully. However, the CORBA specification provides no guidance or suggestion as to how the transaction objects 221 should respond with respect to the other directly registered resource objects 241 and 251 when a directly registered resource object 231 throws one of these two new exceptions. Resource objects 241 and 251 are still holding locks. The transaction objects 221 must send some command to them so that they can release their locks and thus free themselves up for use by another transaction. However, it is not apparent how the transaction objects 221 should go about deciding how to deal with these resource objects 241 and 251 when an exception has been thrown by resource object 231.
According to a first aspect, the present invention provides a server for use in a client/server computing system which coordinates the processing of distributed transactions in the client/server computing system, the server has: a means for sending requests for votes to each resource which has been called by the server to take part in a distributed transaction; a means for receiving votes from each resource in response to having sent requests for votes; a means for determining whether any of the resources has thrown an exception instead of returning a vote; and a means for assigning a programmed direction to a resource which has thrown an exception as a vote to complete the transaction if it is determined that a resource has thrown an exception instead of returning a vote.
Preferably, the following of the programmed direction involves one of two options, with a corresponding option being chosen depending on the value of the programmed direction: (1) assigning a vote of commit if the programmed direction is commit and (2) assigning a vote of rollback if the programmed direction is rollback.
Further preferably, the means for following consults a configurable variable to determine the programmed direction.
Further preferably, the exception indicates that a resource cannot determine the status of its subordinate resources or the exception indicates that at least one of a resource""s subordinate resources has taken an heuristic decision to commit and at least one of a resource""s subordinate resources has taken an heuristic decision to rollback.
In the preferred embodiment, the client/server computing system is heterogenous, object-oriented and conforms to the Common Object Request Broker""s Object Transaction Service standard.
According to a second aspect, the invention provides a method of carrying out the functionality of the server described above in the first aspect.
According to a third aspect, the invention provides a computer program product stored on a computer readable storage medium for, when run on a computer, carrying out the functionality of the first aspect.
Thus, with the present invention, transaction objects in a server process coordinating a distributed transaction are given efficient and effective guidance concerning the steps which should be taken in the event of receiving a thrown exception from a resource object while the transaction is in the prepare state. Further, since the server process coordinating the distributed transaction can be programmed to follow a certain course when faced with an exception, the course can be tailored to best suit the type of transaction. For example, for an on-line banking transaction, it may be better to rollback the entire transaction if an exception is thrown (to avoid incorrectly debiting/crediting a customer""s bank account balance), while for an order entry transaction, it may be better to commit (to avoid making customers re-enter data).