1. Field of the Invention
This invention relates to a method and apparatus for completing transactions in a fault tolerant distributed database system.
2. Description of the Prior Art
A distributed database system refers typically to a system with two or more separate intercommunicating databases. At least part of the stored data is identical in two or more database copies. Therefore, when common data is changed in one of the database copies, the same change must be made in all the other database copies in order to keep the databases uniform throughout the database system. Under normal circumstances, database changes are made by a master database controller. The database master controller makes changes to its own copy of the database and has responsibility for controlling updates to other copies of the database that comprise the network. Problems arise, however, when faults occurring either in the database copies or the links coupling the copies to the database master controller prevent the transmission or change to all or part of the databases.
Within a distributed database network, information is entered to the individual databases by a transaction. A database "transaction" is a sequence of user controlled (or master database controlled) database actions which is marked by both a "begin step" and an "end step." The database actions between a begin step and an end step comprise steps or actions by which a database is updated. The end step can be either a commit or an abort. A commit is an instruction to carry out the previous updating transactions, effectively changing the database. An abort is an instruction to void the previous updating transactions. There are two types of transactions that may occur at each processor: cold transactions and hot transactions. A cold transaction is a transaction that has already been completed and is used only during the recovery period of a failed database processor. A hot transaction is an ongoing transaction that has not completed or aborted.
Distributed databases in the telecommunications industry need to be reliable with a high degree of availability. Additionally, these systems need to be fault tolerant as well: the failure of one database copy should not bring down the entire system. The telecommunications industry is very demanding in this regard, as seen in the example of access to information such as 800 numbers. When a call is placed, the response time between the query to the database and the return of the corresponding number associated with the query needs to be immediate and reliable. Any responsive delay creates a delay in completing the call resulting in customer unsatisfaction.
In a distributed database system, data base synchronization is usually provided with an algorithm called "two-phase commit". A two-phase commit is executed with one copy of the database designated as the "coordinator", a "master" or a controller and all the other copies in the distributed database system designated as the "participants", the "slave" nodes or copies. The two-phase commit algorithm operates as follows: