In today's world of digital information handling, individuals may store information or data using a variety of different devices and in a variety of different locations. Often a user stores the same information in more than one device or location. In many cases, such a user would like all of their various data stores to have the same information without having to manually input the same changes into each data store. Replication, or synchronization, of data is one process used to ensure that each data store has the same information.
For example, a user may maintain an electronic address book or a set of email messages in a myriad of different devices or locations. The user may maintain the address book or email addresses, for example, on a desktop computer in a data store accessible using personal information manager software, on their laptop computer, on a personal digital assistant (PDA) or mobile phone, using an on-line contacts manager or email management web site, and the like. The user may modify the contact information or send/receive email addresses using applications associated with each location. Regardless of where or how a change is made, one goal of replication is to ensure that a change made on a particular device or in a particular location is ultimately reflected in the data stores of the other devices and in the other locations.
One common replication method involves tracking changes that have occurred subsequent to a previous replication. For example, a device that seeks to replicate with another device may submit a request for changes to the other device. Ideally, the changes that the other device sends are those that have occurred since the last replication. The device, or “replica, ” that responds to a request for updated information may check for any changes that are time stamped subsequent to a previous replication. Any changes with such a time stamp may then be sent to the device requesting replication. Typically such replication requires that each replica be aware of the other replicas or the replication topology in which it is operating. Each replica may also need to maintain a record of what changes have been replicated on other replicas. In effect, each replica may need to maintain information about what it believes is stored on the other replicas within the topology.
The challenges of replication become more complicated when more than two replicas are included in the same sync community or topology. Among these challenges are problems involving replacing more current data with outdated data based on the order devices are replicated, replicating data that may already be in sync, and having data that is in sync be reported as being in conflict.
As one example, consider a sync community that includes three replicas. A user updates replica 1 at time 1. At time 2, the same data is updated in replica 2. Replica 2 then replicates with replica 3 and the changes made in replica 2 are incorporated into replica 3. If replica 3 subsequently receives changes from replica 1, the data originally updated on replica 2 may be replaced with the original data from replica 1, even though the change from replica 1 is not the most recent change.
In some cases, communication resources may be wasted when replicas incorrectly believe that their information is out of sync, and so perform unnecessary sync operations. For example, suppose in the three replica sync community introduced above that a user updates replica 1. The changes in replica 1 are then replicated to replica 2. Replica 2 then replicates its changes to replica 3 so that the information from replica 2, which is currently also the information from replica 1, is changed on replica 3. Replica 3 then replicates with replica 1. In some cases, replica 3 may know that replica 1 has been updated, but not know the version of information on replica 1. Because of this, replica 3 may replicate its information to replica 1, even though the same information is already on replica 1. Further, additional needless replications may continue as replica 1 replicates with replica 2 or performs other pair-wise replications at subsequent times.
In some cases, replicated data may actually appear as being in conflict, even when it is not. For example, consider again a three replica sync community. The information on replica 1 is updated and replicated to replica 2. The information on replica 1 is then replicated to replica 3. Replicas 2 and 3 then attempt a replication only to discover that they each have changes (from the replication with replica 1) that have occurred since their last replication. Even though the changes are the same, replicas 2 and 3 may think they are in conflict.
Another set of problems may occur when it is desirable to only replicate part of the data in a data store at a particular time. For example, suppose the data store includes email messages in various folders, including an inbox folder and some number of other folders including, perhaps, folders that contain saved email messages. In some cases a user might want to replicate changes to all of the email folders. For example, this might be desirable when the communications bandwidth between replicating devices is large. In other cases—perhaps when the bandwidth is limited, as it might be at some times with a mobile phone or PDA—the user might only want to replicate changes to particular folder, like their inbox.
It is also conceivable that a user might want to synchronize only part of their entire set of data in all cases. For example, a user might want to maintain all email on a desktop computer or server, but only synchronize their inbox and a selected set of folders to a small device that has limited storage. In this case, some information may never be synchronized with a particular device.
As another example, consider a data store that includes digital music files. In some cases, a user might want to synchronize their entire digital music library—perhaps they have a portable music player or computer with a large hard drive. They may also have a small portable music player with a limited amount of flash memory, on which they only want to store a selected set of music. In one example, this music to be synchronized might include, say, digital music files they have rated with “four stars” or “five stars, ” as well as music downloaded in the last week.
When synchronizing a particular set of data, like in the situations introduced above, various additional problems may occur. For example, data may fit the criteria of a filter and be in a desired set of data at one time or on one device, but not fit the criteria and so not be in the desired set of data at another time or on another device. Additionally, each replica may need to continue to maintain an understanding of the data it has synchronized from different devices, even when that data may, for example, be a subset of the full set of data during some synchronizations, and the full set of data during other synchronizations.