The present invention generally relates to distributed computing systems, and more specifically to using a resource manager to coordinate the committing of a distributed transaction.
One of the long standing challenges in distributed computing has been to maintain data consistency across all of the nodes in a network. Perhaps nowhere is data consistency more important than in a distributed transaction system where distributed transactions may specify updates to related data residing on different resource managers. In this context, a distributed transaction is a transaction that includes a set of operations that need to be performed by multiple resource managers. A resource manager, in turn, is any entity that manages access to a resource. Examples of resource managers include queues, file server systems and database systems.
To accomplish a distributed transaction that involves multiple resource managers, each of the resource managers is assigned to do a set of operations. The set of operations that need to be performed by a given resource manager is generally referred to as a child transaction. For example, a particular distributed transaction may include a first of set operations that need to be performed by a first resource manager and a second set of operations that need to be performed by a second resource manager. In distributed systems, the first and second sets of operations are generally referred to as first and second child transactions.
One approach for ensuring data consistency during distributed transactions involves processing distributed transactions using a two-phase commit mechanism. Two-phase commit requires that the transaction first be prepared and then committed. During the prepare phase, the changes specified by the transaction are made durable at each of the participating resource managers. If all of the changes are made without durable error at each of the participating resource managers, then the changes are committed (made permanent). On the other hand, if any errors occur during the prepare phase, indicating that at least one of the participating resource managers could not make the changes specified by the transaction, then all of the changes at each of the participating resource managers are retracted, restoring each participating resource manager to its state prior to the changes. This approach ensures data consistency while providing simultaneous processing of the changes.
In certain distributed computer systems, an application program, or separate tp-monitor is used to coordinate the processing of a two-phase commit for distributed transactions. For the purpose of explanation, the processing of distributed transactions shall be described in the context of a distributed transaction in which the resource managers involved in the distributed transaction are database systems. For example, FIG. 1A illustrates a distributed database system 100 in which distributed transactions can be performed. As depicted, distributed database system 100 includes an application program 108 and a plurality of database systems 104 and 106. Application program 108 interacts with database systems 104 and 106 to perform distributed transactions that involve access to data managed by database systems 104 and 106.
Database systems 104 and 106 respectively include database server processes 110 and 112, and nonvolatile memory areas 114 and 116. Nonvolatile memories 114 and 116 represent nonvolatile storage, such as a magnetic or optical disk, which can be used to durably store information. In this example, nonvolatile memories 114 and 116 respectively include databases 130 and 132. Database 130 includes a log 118 and an employee table 126. Database 132 includes a log 120 and a department table 128.
Database servers 110 and 112 respectively manage the resources of database systems 104 and 106. Database systems 104 and 106 may be either homogenous or heterogeneous systems. For example, database systems 104 and 106 may both be Oracle(copyright) database server systems. Alternatively, database system 104 may be an Oracle(copyright) database server system while database system 106 may be an IBM(copyright) database server system such as DB2(copyright). Although not shown, database systems 104 and 106 generally include an application program interface (API) that allows them to communicate with application program 108 using their native protocol language.
Application program 108 includes a set of one or more processes that are used to coordinate the execution of distributed transactions on database systems 104 and 106. In coordinating the execution of a distributed transaction, application program 108 communicates with database systems 104 and 106 using the native language of each of the respective database systems. For example, if database system 104 is an Oracle database system, application program 108 may communicate with database system 104 using a communication protocol such as the Oracle Call Interface (OCI) protocol. Optionally, if database system 106 is an IBM DB2 database system, application program 108 may communicate with database system 106 using a communication protocol such as the SQL/DS protocol.
To coordinate a two-phase commit sequence, application program manager 108 first prepares the various child transactions of the distributed transaction at the database servers that are responsible for performing the child transactions. After the application manager 108 has determined that all of the database servers have prepared their respective child transactions, the application program informs all of the database servers to commit the child transactions. If any database server is unable to complete its child transaction, then the application program informs all of the database servers to roll back their respective child transactions.
Because application 108 is responsible for coordinating the processing of distributed transactions between database systems 104 and 106, application program 108 is typically required to store xe2x80x9cparticipationxe2x80x9d information in nonvolatile memory. In general, the participation information includes the list of resource managers that are participating in the distributed transaction (xe2x80x9cparticipantsxe2x80x9d) and a set of identifiers for identifying the child transactions. This participation information is stored before the application program sends the prepare commands to the participants in the distributed transaction. To maintain the participant information, application 108 includes a log 124 within a nonvolatile memory area 122. If the application program fails before sending the prepare commands, the participants will rollback their changes since they were never in the prepared state.
However, if the application program fails after sending the prepare commands, but before sending the commit commands, the application program can use the participation information in its log to query each participant to determine, depending on the outcome of the distributed transaction, whether a commit or rollback command should be sent to the participants.
For example, a user may submit a command though application 108 to add a new employee record into distributed database system 100 for company xe2x80x9cAxe2x80x9d. In this example, it is assumed that employee table 126 stores personal employee information that needs to be stored for each employee of company A. It is also assumed that department table 128 stores departmental information that needs to be stored for each employee that is currently working at company A.
To add a new employee record, a user submits a command though application 108 to insert the new employee information into distributed database system 100. Upon receiving the command, application 108 coordinates the execution of a distributed transaction to insert the personal employee information into employee table 126 and the departmental information into department table 128. For example, the new employee""s name and home address may be inserted into database system 104 using a first child transaction while the employee""s name and assigned department number may be inserted into database system 106 using the second child transaction. Once the changes for the distributed transaction are to be committed, application program 108 coordinates a two-phase commit to cause to the changes to be durably stored in employee table 126 and department table 128.
Because the first and second transaction are part of the same distributed transaction, their corresponding changes must both either be committed or rolled back in nonvolatile memories 114 and 116 respectively. Thus, as part of the two-phase commit sequence, application program 108 is required to durably store participation information in log 124. By durably storing the participation information in log 124, application program 108 guarantees that even if a failure occurs, all changes associated with the distributed transaction will either be committed or rolled back.
However, a drawback to performing a two-phase commit in this manner is that application program 108 must durably store information in nonvolatile memory during the two-phase commit sequence. Typically, the storage of this information is a time consuming process. Thus, the committing of the changes for the distributed transaction is not only delayed by the time that is required to write redo information in logs 118 and 120, but also by the time that is required to write participant information in log 124. For many systems, such as systems in which distributed transactions are continually being processed, there is need to reduce the amount of time that is required for committing a distributed transaction (xe2x80x9ccommit latencyxe2x80x9d).
One method of reducing the commit latency, as well as the administrative overhead of managing the application program log, is to have a database system, one that is itself currently committing changes for the distributed transaction, act as the coordinator for the two-phase commit sequence. For example, FIG. 1B illustrates a distributed computer system 150 in which database system 104 coordinates all two-phase commit sequences that are required for distributed transactions that are initiated through application program 108, and which require changes to be performed at both database systems 104 and 106.
For example, to add information about a new employee, as previously described for FIG. 1A, application program 108 communicates the new employee information to database system 104. In general, changes that are associated with a different database system typically include a connection qualifier that indicates the database system for which the changes are to be made. For example, changes for department table 128 will typically include a connection qualifier that indicate department table 128 is stored in database system 106. In certain systems, such as Oracle database systems, these connection qualifiers are called database links. Other types of database systems that support distributed transactions provide similar mechanisms to identify and access remote tables.
In this example, when database server 110 detects that one of the changes is to a table in database system 116, database server 110 creates a second child transaction for database server 112. Database system 104 then forwards the modifications to database system 106 for storing in department table 128.
Once the changes specified in the first child transaction have been made to employee table 126, and the changes specified in the second child transaction have been made to department table 128, the distributed transaction is ready to commit. Database system 104 then coordinates a two-phase commit to cause the changes to be durably stored in employee table 126 and department table 128.
Because a separate application program is not used to coordinate the two-phase commit, the committing of the changes is not delayed by the time that is normally required for an application program to durably store redo information in a log. Thus, relative to a system that requires an application program to coordinate the two-phase commits, the commit latency of systems in which one of the participating database systems coordinates the two-phase commit is reduced as fewer logs must be generated and durably stored before committing the distributed transaction.
However, because all communications between application program 108 and database system 106 are required to travel through database system 104, the access time for data residing on database system 106 may be significantly increased. Thus in certain cases, the actual time that is required to complete the changes for a distributed transaction coordinated by one of the resource managers involved in the distributed transaction may actually be increased relative to systems in which the distributed transaction is coordinated by the application itself.
Based on the foregoing, there is a need to provide a mechanism that can reduce the amount of commit latency incurred when an application coordinates its own distributed transaction, but which does not increase the data access times.
The foregoing needs, and other needs and objects that will become apparent from the following description, are achieved in the present invention, which comprises, in one aspect, a method for using a resource manager to coordinate the committing of a distributed transactions, the method comprising the computer-implemented steps of communicating a first set of changes to a first resource manager. These first set of changes are directly communicated to the first resource manager without being received at a second resource manager. Communicating a second set of changes to the second resource manager. These second set of changes are directly communicated to the second resource manager without being received at the first resource manager. Selecting either the first resource manager or the second resource manager as a committing coordinator. Transmitting a commit request message to the committing coordinator to request that the first set of changes be committed at the first resource manager and that the second set of changes be committed at the second resource manager. In response to receiving the commit request message, the committing coordinator causes, as an atomic unit of work, the first set of changes to be committed at the first resource manager and the second set of changes to be committed at the second resource manager.
According to another feature of the invention, the distributed transaction includes a first and second child transaction. The first set of changes are communicated to the first resource manager by transmitting the first child transaction to the first resource manager and the second set of changes are communicated to the second resource manager by transmitting the second child transaction to the second resource manager.
In yet another feature, the first set of changes and the second set of changes are committed as an atomic unit of work by performing a two-phase commit between the first resource manager and the second resource manager.
In still another feature, the first resource manager uses a first protocol to communicate with other components while the second resource manager uses a second protocol to communicate with other components. To cause the first set of changes and the second set of changes to be committed as an atomic unit of work, the first resource manager and the second resource manager communicate with each other through the use of a gateway device.
The invention also encompasses a computer-readable medium, a computer system, and a computer data signal embodied in a carrier wave, configured to carry out the foregoing steps.