The present invention relates in general to replicated database systems, and, more particularly, to a system and method for updating the records in such replicated database systems.
As the name implies, a replicated database system is a database system that comprises a plurality of databases each having an identical set of records. Replicated databases improve reliability as each database generally includes the exact data as the other databases. Processing time is also improved as each of the databases may be accessed simultaneously by different users. Database operations themselves are transparent to the user as any of the databases may supply the desired data. Further, if any one database should fail, there are a number of other databases available to perform the identical function of the failed database.
There are three main principles that should be considered in all replicated database systems. The first principle, atomic transaction processing across multiple database sites, requires that a sequence of database operations be performed in its entirety or not at all. The second principle, database synchronicity, requires that database users have a consistent view of the data independent of which of the replicated databases may be accessed. The third principle, disaster avoidance, requires that the same transaction is never sent to all of the databases simultaneously. Accordingly, any replicated database system must address these three principles.
Referring now to FIG. 1, a typical database system 10 is illustrated. The database system 10 comprises a plurality of replicated databases 11-13, a database provisioning system 14 and a database querying system 16. The database provisioning system 14 is configured to assure that the data in the replicated databases 11-13 is accurate and accessible. The database provisioning system 14 is also configured to update the records in each of the replicated databases 11-13 as needed by performing write operations primarily to respective records. The update operation must be accurate so that the records in all of the replicated databases 11-13 are consistent with each other.
The database querying system 16 is configured to retrieve specific records from the databases 11-13 as requested by one of a number of database users 18 accessing the database system 10. The actual database accessed by the database querying system 16 is transparent to the database user because the database querying system 16 determines the replicated database to which it sends the data request/query. As with most replicated database systems, the database querying system 16 may choose a different database for subsequent requests of the same data such that there is a need for database synchronicity.
The Two-Phase Commit (2PC) Protocol is currently used to update records in a number of replicated database systems. However, while this protocol addresses database atomicity, it only partially addresses data synchronicity and does not address disaster avoidance. In the first phase of database provisioning using the 2PC protocol, the database provisioning system 14 sends an update transaction, i.e., update data, to all of the replicated databases 11-13. The databases 11-13 process the transaction by placing the update data in an inactive state. Thus, at this point, the old data is still available for access, but the update data is in an inactive state and thus not available for access. If the database querying system 16 requests data from any of the databases 11-13, the accessed database will return the old data as the update data is in an inactive state and not accessible. Thus, data synchronicity is assured during this phase of processing.
After the update data has been placed in an inactive state, each database transmits a ready-to-commit (RTC) acknowledgment signal to the provisioning system 14 indicating that the database is ready to update the record and complete the transaction in the second phase of provisioning. If all of the replicated databases 11-13 respond with an RTC acknowledgment, the provisioning system 14 will send a commit transaction to all of the databases 11-13 instructing them to update the record with the update data in the third phase of provisioning. After the commit transaction is processed, the update data is available for access while the old data is not as the old data is deleted by the update. If one of the databases 11-13 fails to place the update data in the inactive state, the provisioning system 14 will request that all of the other databases rollback the transaction. That is, the databases delete the update data such that the old data remains accessible. Thus, transaction atomicity is assured by the 2PC protocol.
If the database querying system 16 requests data from any of the databases 11-13 after the commit transaction has been processed, the accessed database will return the new data. However, if the database querying system 16 requests data from any of the databases 11-13 prior to the commit transaction being processed, the accessed database will return the old data. Accordingly, there is a "window" of time in which the data may be inconsistent (i.e., new data is returned in response to a request to a database having performed the commit transaction and then old data is returned in response to a subsequent request to a database which has yet to perform the commit transaction) such that data synchronicity is not assured for the duration of the 2PC protocol. The "window" is caused because the time-to-commit may take several seconds to several minutes. With today's fast processors, several thousand queries can be processed in that time leading to inconsistent views of the data relative to the other queries. Further, as the same transaction is transmitted to all databases simultaneously, the 2PC protocol violates the principle of disaster avoidance since each of the databases could fail if the transaction itself is defective. Thus, the 2PC protocol assures database atomicity, but has a window of uncertainty during processing with regards to data synchronicity, and totally ignores the principle of disaster avoidance.
Accordingly, there is a need for a replicated database system and a method for updating records in such a system that assures data atomicity, data synchronicity and disaster avoidance during all phases of processing. There is another need for a replicated database system in which provisioning of the database does not affect database availability to the user. Preferably, such a system is relatively easy to implement and cost effective.