1. Field of the Invention
The present invention relates to design of distributed data systems and methods, and more particularly, to design of data replication systems and methods.
2. Description of the Related Art
Distributed data systems (and methods) use a central database with copies of that database distributed to client computers within the system. For example, in a conventional distributed data system having a central server computer and one or more client computers, each client computer uses a copy of the central database or data repository that is located on the server computer. Each client computer performs computer application functions and operations using the copy of the database. To keep each copy of the database at the client computer matching with the central database located at the server computer, conventional distributed data systems use conventional data replication systems.
Conventional data replication systems provide high data availability and enhance performance by allowing a copy of the database to be moved from the server computer to the client computers thereby eliminating or removing system bottlenecks such as numerous input/output operations with the server computer. Conventional data replication systems, however, have a number of drawbacks.
First, many conventional replication systems only allow for computer applications to "read" the copy of the database at the client computer. To ensure consistency and agreement, these conventional replication systems do not perform both a "read" and a "write" with the copy of the server database. Specifically, these conventional replication system are concerned with the data integrity of the server database becoming compromised if, for example, a copy of the database on the client computer is updated but the central database on the server computer is not properly updated. Read only data replication systems, therefore, are not well suited for computer applications that perform transactional functions and operations at the client computer.
Other conventional replication systems allow both "reads" and "writes" to the copy of the server database at the client computer. These conventional replication systems, however, cannot guarantee agreement and consistency between the server database itself and the copy of the server database. In particular conventional replication systems are unable to correctly serialize transactions that are applied to the various copies of the server database. Moreover, transactions cannot be serialized in such systems without adversely affecting overall system performance.
It is noted that a database is considered "consistent" if it satisfies all applicable user defined consistency rules so that the source database also remains consistent. Further, "agreement" refers to having all copies of a database agree despite minor differences between the copies resulting from latency. The copies of a database in a correctly functioning replication system must be in agreement, although they may never actually match. Second, data replication systems that do allow both "read" and "write" transactions lack a protocol that ensures that each client database is in agreement with both the server database and the other client databases.
A third problem with conventional data replication systems arises from the use of locks to prevent conflicts between transactions that access different copies. Such locks are not practical for a number of reasons. For example, a lock must be visible to every transaction that accesses a copy of the database. This is not possible for copies of the database on client computers that are disconnected from the network. In a connected environment, the cost of acquiring a lock that is visible to all copies of the database is prohibitive because making a lock usable across a network requires passing of messages.
Another problem with using locks to serialize transactions against different copies of a database is that if a lock is visible over an unreliable network, very difficult failure situations arise, such as network partitions. Moreover, if the server database is no longer in agreement with the copies of the database at the client databases, there is an increased probability that the data in the distributed data system may become compromised. Once the data is compromised, the system fails. Thus, conventional data replication systems allowing both "read" and "write" transactions are not suitable for mission critical distributed data systems where maintaining data integrity is essential.
With conventional replication data processing systems it is difficult to build an automatic mechanism to guarantee agreement when transactions that update different copies of the database at different computers conflict. Aspects of the problem which contribute to the difficulty include requiring the mechanism to respect the consistency rules for the database. These rules may be complex. Often no declarative form of these rules exists. In fact for many applications the only practical way to specify the consistency rules for a database is to write complex procedural logic, specifically, triggers.
A fourth problem with conventional data replication systems occurs when transactions at two different copies conflict. Here, it is possible for an arbitrary number of additional transactions that depend on changes made by the conflicting transactions to occur. Conventional replication systems do not assure that after correcting one or both of the conflicting transactions, that changes made by these dependent transactions do not corrupt the server database, i.e., that they still make sense and they still respect the database's consistency rules.
A fifth problem with conventional data replication systems is that there is no guarantee of data recovery if a server database loses a transaction. For example, if a server database fails, e.g., crashes, and loses a portion of its recovery log, it may be unable to recover transactions from the damaged section of the log. The server database will lose such transactions. In the case that the target database holds a transaction that the server database loses, conventional replication systems become confused.
A sixth problem with conventional data replication systems is that they do not typically automate the distribution aspects of a computer software upgrade such that the client computer remains operable and the database useable. Many existing data replication systems require new software or upgrade utility installations on every client computer. During the installation process, the client computer must remain unusable so that the database is not corrupted. To be successful, the installation must be well planned, including recovery plans in the event that the upgrade fails. The installation must also be well tested and run at times when the client computers are least used so that the process is least disruptive. Thus, existing data replication systems are unwieldy, especially in large installations, for example having 10,000 client computers and are not well suited in increments requiring a high degree of client computer availability.
Therefore, there is a need for a data replication system and method that (1) allows for replicating or copying a source database across multiple client computers and (2) allows each client computer to freely transact, i.e., both "read" and "write," with the copy of the database while (3) providing database consistency to ensure data integrity and (4) allowing for complete disaster recovery.