The present invention relates generally to a fault-tolerant method and system for processing global transactions in a distributed database system. In particular, a transaction management method and system are provided that renews processing of global transactions interrupted by a fault in the distributed database system after the fault has been remedied.
A distributed database system provides a user with a uniform method for accessing one or more homogeneous or heterogeneous database management systems that can be located at different computing sites or at one computing site. Typically, such distributed database systems include a centralized database management system for accessing the various local database management systems. In using a distributed database system, the user is not required to know the location or the characteristics of the data needed for a particular transaction when using a global data model and global transaction language. Universality of access to distributed database systems is a feature of Amoco Production Company's distributed database system (ADDS), as described in "ADDS--Heterogeneous Distributed Database System" by Y. J. Breitbart and L. R. Tieman, Distributed Data Sharing Systems, North-Holland Publ. 1985. Data update concurrency control in ADDS is described in U.S. Pat. No. 4,881,166 which is incorporated by reference herein.
In current distributed database systems, including ADDS, if a fault occurs in the distributed database system, such as a communication network failure or equipment inoperability or unavailability, the processing of all transactions affected by the fault ceases, all data updated as a result of the transactions are returned to their original status, and the user is provided with an abort message. The cessation of update transactions is needed to ensure the integrity of all data affected by the update transactions. However, in some applications, it is necessary to perform the database operations in a transaction until the operations complete successfully. For example, it is very costly to abort transactions that retrieve large amounts of data. Moreover, a fault anywhere in the centralized database management systems causes all transactions to cease regardless of the states of the various distributed database management subsystems. Such centralized database management systems provide the user with neither an indication of the availability of resources in the distributed database system nor a suspended or recovery status of global transactions. Moreover, the user is provided with no method for intervention in a transaction. The centralized database management system provides no method for dynamic site switching for locating data for a particular transaction.
There is a need for a distributed database system that is fault-tolerant to overcome the foregoing deficiencies as well as meet the above described needs.