1. Field of the Invention
The present invention is directed to the field of data replication, and, more specifically, to maintaining coherency between multiple replicas of a data set.
2. Description of the Prior Art
The need to replicate data sets is becoming increasingly important. Server to server data replication provides for greater data redundancy in the case of faults. Server to server replication further provides for increased data availability, increased load balancing, and increased geographic proximity between users and data. Server to client data replication enables access to replicated data on a client device. For example, replicated data may be accessed on a desktop computer at an office or on a portable laptop computer at a remote location. Client to client replication, which may also be referred to as peer to peer replication, enables access to replicated data on multiple client devices. For example, data may be changed on a desktop and replicated on a portable laptop computer at a remote location.
In existing data replication methods, local changes are time stamped at the replica on which they occur. Because physical clocks may not always be accurately synchronized, a logical timestamp rather than a physical timestamp is required. A set of changes that occur in close temporal proximity are assigned a logical timestamp which may be referred to as a “generation.” Generations enable ordering of changes at a local replica using a monotonically increasing “local counter”. The local counter has no relevance across replicas. The unique identifier associated with a generation enables multiple replicas to compare sets of generations to computer the incremental changes that will be propagated between replicas. For example, rather than being assigned a physical timestamp such as “1:00 A.M.”, a change may be assigned a logical timestamp such as “Generation G1 with local counter value 40.” Each local change is therefore assigned a logical timestamp. The assignment of logical timestamps enables a replica to request a particular generation or set of generations from another replica.
One limitation of existing data replication is that each individual replica must maintain a list of generations received from other replicas. Specifically, records of each generation with its unique identifier and a locally mapped counter value may be stored in a table. Because data is often replicated over a long period of time, such tables typically require a large quantity of memory to store and are inefficient to propagate across a network. Although such tables may be periodically “pruned” to decrease their size, the pruning operation may result in the deletion of valuable records which will prevent efficient incremental synchronization with replicas which have not recently synchronized, which may be referred to as latent replicas. Thus, there is a need in the art for data replication methods that do not require records of generations and their corresponding local timestamps.
Another limitation of existing data replication methods is redundant transfer of generations between replicas in a dynamic synchronization topology. Such redundant transfer occurs because a replica must inspect a given generation of changes to determine whether or not the changes have already been replicated. Thus, there is a need in the art for data replication methods in which such redundant transfers are unnecessary.
Furthermore, in addition to the desired characteristics set forth above, there is a need in the art for data replication methods that perform effectively in a variety of different environments. Specifically, data replication may be employed in either a single-master or a multi-master environment. In single-master data replication, data elements may be changed only at a single “master” replica, while, in multi-master data replication, data elements may be changed at multiple and possibly all replicas. Additionally, data may be replicated in either a fixed synchronization topology or a dynamic synchronization topology. In a fixed synchronization topology, a particular replica will only synchronize with a fixed set of other replicas, while in a dynamic synchronization topology, all replicas may potentially synchronize dynamically with every other replica. Furthermore, updating of data at multiple replicas may be either strictly serialized or it may occur in both serial and in parallel. Thus, systems and methods for data replication that perform effectively in the environments set forth above are desired.