a. Technical Field
The present invention relates to he field of database management for distributed processing systems and, more particularly, to maintaining consistency of database replicas within a distributed processing system via an improved epidemic protocol.
b. Description of the Relevant Art
Distributed processing systems typically involve the preservation and maintenance of replica databases throughout a system. Data replication is often used in distributed systems to improve availability and performance. Examples of replicated systems abound and include both research prototypes and commercial systems. A problem arises when one or more files or items of these database replicas is updated, thereby rendering the rest of the database replicas obsolete. In other words, when one replica is changed, the rest of the replicas should change accordingly; otherwise, replicas will diverge. Many distributed databases use an epidemic approach to manage replicated data. In the epidemic approach, user operations are executed on a single replica. Asynchronously, a separate activity performs the periodic pair-wise comparison of data item copies to detect and bring up to date obsolete database copies. This activity is known as anti-entropy and typically involves the comparison of time stamps of items/files and the propagation of updates to older replicas when the time stamps do not agree. The overhead due to comparison of data copies grows linearly with the number of data items in the database, which limits the scalability of the system.
On the other hand, epidemic protocols exhibit several desirable properties. User requests are serviced by a single (and often a nearby) server. Update propagation (the propagation of updates in the distributed system) can be done at a convenient time, for example, during the next dial-up session. Also, multiple updates can often be bundled together and propagated in a single transfer.
The anti-entropy overhead problem, it might appear, may be simply solved by having each server accumulate its updates and periodically push them to all other replicas, without any replica comparison. The need for time stamp comparison is then eliminated. However, the following dilemma arises. If recipients of the updates do not forward them further to other nodes, then full responsibility for update propagation lies with the originating server. A failure of this server during update propagation may leave some servers in an obsolete state for a long time, until the originating server is repaired and can complete the propagation. On the other hand, forwarding updates by servers to each other would create redundant traffic on the network.
Update propagation can be done by either copying the entire data item or by obtaining and applying log records for missing updates. Among known commercial systems, "Notes" (tm) software available from Lotus (now IBM) uses whole data item copying. Oracle's Symmetric Replication system copies update records.
The Lotus Notes protocol associates a sequence number with every data item copy, which records the number of updates seen by this copy. As an epidemic protocol, Lotus assumes that whole databases are replicated, so that anti-entropy is normally invoked once for all data items in the database. Each server records the time when it propagated updates to every other server (called the last propagation time below).
Consider two nodes, i and j, that replicate a database. Let i invoke an instance of anti-entropy to compare its replica of the database with that of server j, and catch up if necessary. Anti-entropy executes the following algorithm:
1. When node j receives a request for update propagation from i, it first verifies if any data items in its replica of the database have changed since the last update propagation from j to i. If no data item has changed, no further action is needed. Otherwise, j builds a list of data items that have been modified since the last propagation. The entries in the list include data item names and their sequence numbers. j then sends this list to i. PA1 2. i compares every element from the received list with the sequence number of its copy of the same data item. i then copies from j all data items whose sequence number on j is greater.
This algorithm may detect in constant time that update propagation is not required, but only if no data item in the source database has been modified since the last propagation with the recipient. However, in many cases, the source and recipient database replicas will be identical even though the source database has been modified since the last update propagation to the recipient. For instance, after the last propagation between themselves, both nodes may have performed update propagation from other nodes and copied some data modified there. Or, the recipient database may have obtained updates from the source indirectly via intermediate nodes.
In these cases, Lotus incurs high overhead for attempting update propagation between identical database replicas. At the minimum, this overhead includes comparing the modification time of every data item in the source database against the time of the last update propagation. Thus, it grows linearly in the number of data items in the database.
In addition, the first step of the algorithm will result in a list of data items that have been modified or obtained by j since the last propagation. This list will be sent to i, who then will have to perform some work for every entry in this list in step 2. All this work is overhead.
According to Kawell, Jr. et al, "Replicated Document Management in a Group Management System," Second Conference on Computer-Supported Cooperative Work, September, 1988, it appears that the Lotus update propagation protocol correctly determines which of two copies of a data item is newer only provided the copies do not conflict. When a conflict exists, one copy is often declared "newer" incorrectly. For example, if i made two updates to x while j made one conflicting update without obtaining i's copy first, x.sub.i will be declared newer, since its sequence number is greater. It will overide x.sub.j in the next execution of update propagation. Thus, this activity does not satisfy "correctness criteria" as will be further described herein.
Unlike Lotus Notes, Oracle's Symmetric Replication protocol is not an epidemic protocol in a strict sense. It does not perform comparison of replica control state to determine obsolete replicas. Instead, it simply updates records by applying log records for missing updates. Every server keeps track of the updates it performs in a log and periodically ships them to all other servers. No forwarding of updates is performed.
In the absence of failures, this protocol exhibits good performance. However, a failure of the node that originated updates may leave the system in a state where some nodes have received the updates while others have not. Since no forwarding is performed, this situation may last for a long time, until the server that originated the update is repaired. This situation is dangerous, not only because users can observe different versions of the data at the same time, but also because it increases the opportunity for user updates to be applied to obsolete replicas, thus creating update conflicts. Thus, Oracle is susceptible to failures during update propagation.
Consequently, neither the Lotus Notes protocol nor the Oracle protocol are perfectly successful in maintaining consistency in replicated databases, and there is room for improvement in the art.