One of the long standing challenges in distributed computing has been to maintain data consistency across all of the nodes in a network. Perhaps nowhere is data consistency more important than in distributed database systems, where a distributed transaction may specify updates to related data residing on different database systems. To maintain data consistency the distributed transaction must be either committed or, in the event of an error, "rolled back". When a transaction is committed, all of the changes to data specified by the transaction are made permanent. On the other hand, when a transaction is rolled back, all of the changes to data specified by the transaction already made when the error occurred are retracted or undone, as if the changes to the data were never made.
One approach for maintaining data consistency for distributed transactions involves processing data on one database system at a time. All changes specified by the transaction to data residing in a particular database system are applied and confirmed before changes specified by the transaction are applied within another database system. Although this approach ensures data consistency across the distributed database system, database changes specified by the transaction are made serially, requiring more time to complete the transaction.
Consider the simple distributed database system 100 depicted in FIG. 1 which includes a database systems 102, 104 and 106. Database system 102 includes a server process 108 and a database 110. Database system 104 also includes a server process 112 and a database 114. Finally, database system 106 includes a server process 116 and a database 118.
Suppose a mail order transaction, which requires updates to product inventory data contained in database 110 and credit information contained in database 114, is being processed by server process 108 according to a conventional non-two-phase commit approach. The mail order transaction may be initiated by a database application, for example, that is in communication with database system 102.
In order to process the mail order transaction, server process 108 must first determine from database 110 whether the desired product is available in inventory and from server process 112 associated with database 114 whether sufficient credit is available to pay for the desired product. Having determined that the product and credit availability requirements are met, server process 108 processes the transaction (order) by updating database 110, to reduce the available inventory, and then causing server process 112 to update database 114 to reduce the amount of available credit.
An error preventing the update of either the product inventory on database 110 or the credit information on database 114 would leave the distributed database system 100 in an inconsistent state. For example, if an error prevented an update to the product inventory contained in database 110, a subsequent transaction for an order of the same product may erroneously determine that the previously ordered product is still available, even though it has been removed from inventory.
Another approach for ensuring data consistency during distributed transactions involves processing distributed transactions using a two-phase commit mechanism. Two-phase commit requires that the transaction first be prepared and then committed. During the preparation phase, the changes specified by the transaction are made at each of the participating database systems. If all of the changes are made without error at each of the participating database systems, then the changes are committed (made permanent). On the other hand, if any errors occur during the prepare phase, indicating that at least one of the participating database systems could not make the changes specified by the transaction, then all of the changes at each of the participating database systems are retracted, restoring each participating database system to its state prior to the changes. Unlike the prior serial approach, this approach ensures data consistency while providing simultaneous processing of the changes.
Assume that database systems 102, 104 are homogeneous and support compatible communication protocols and the processing of transactions by two-phase commit. Processing the same mail order transaction using two-phase commit maintains data consistency while providing simultaneous processing of the changes on database 110 and database 114. Processing begins when server process 108 initiates the prepare phase by commanding server process 112 to reduce the amount of available credit by updating database 114. Server process 108 concurrently reduces the available inventory by updating database 110. At this point in the processing of the transaction, although the changes to available credit and available inventory may have actually been applied to database 110 and database 114 respectively, neither of the changes to database 110 nor database 114 have been made permanent. Once both server process 108 and server process 112 have made their respective changes as specified by the transaction without error, the prepare phase is complete.
Upon being informed that the transaction is successfully prepared in database system 104, server process 108 then initiates the commit phase by commanding server process 112 to make the reduction in available inventory in database 114 permanent, while server process 108 makes the reduction in available credit in database 110 permanent. Once the changes have been made permanent in databases 110, 114, the commit phase and processing of the transaction is complete. However, if any errors occur during the prepare phase, that is while server process 108 is updating the available inventory in database 110 or while server process 112 is updating the available credit on database 114, then the updates to both databases 110, 114 are rolled back. This approach ensures that all of the changes specified by the transaction will either be completed successfully and made permanent to databases 110, 114 or, in the event of an error, rolled back.
As illustrated by the preceding example, the two-phase commit mechanism provides the full utilization of the power and flexibility offered by a distributed database system while maintaining data consistency. However, the two-phase commit mechanism can only be implemented for distributed transactions involving homogeneous database systems 102, 104 which support the same two-phase commit communication protocols. For distributed transactions involving heterogeneous database systems, which do not support a common communication protocol, the transactions must be processed serially which requires more time to complete the transactions. For example, as illustrated in FIG. 1, database system 102 and database system 106 do not support a common communication protocol. Therefore, distributed transactions that involve changes to be made in databases 110, 118 cannot be implemented with two-phase commit and must be processed serially.
In view of the inefficiencies associated with processing distributed database transactions serially, a method and apparatus providing for the processing of distributed transactions in heterogeneous computing environments using two-phase commit is highly desirable.