Advances in computer technology (e.g., microprocessor speed, memory capacity, data transfer bandwidth, software functionality, and the like) have generally contributed to increased computer application in various industries. Ever more powerful server systems, which are often configured as an array of servers, are commonly provided to service requests originating from external sources such as the World Wide Web, for example.
As the amount of available electronic data grows, it becomes more important to store such data in a manageable manner that facilitates user friendly and quick data searches and retrieval. Often a user stores the same information in more than one device or location, and replication, or synchronization, of data is a process typically employed to ensure that each data store has identical information. For example, a user can maintain an electronic address book or a set of email messages in a myriad of different devices or locations. Such user can further modify the contact information or send/receive email addresses using applications associated with each location. Regardless of where or how a change is made, a major goal of replication is to ensure that a change made on a particular device or in a particular location is ultimately reflected in other devices/stored locations.
One common replication method involves tracking changes that have occurred subsequent to a previous replication. For example, a device that seeks to replicate with another device can submit a request for changes to such other device. It is desirable that the changes that the other device sends are those that have occurred since the last replication. The device, or “replica,” that responds to a request for updated information can check for any changes that are time stamped subsequent to a previous replication. Any changes with such a time stamp can subsequently be sent to the device requesting replication. Typically, such replication requires that each replica be aware of the other replicas or the replication topology in which it is operating. Each replica can further maintain a record of what changes have been replicated on other replicas. In effect, each replica can maintain information about what it believes is stored on the other replicas within the topology.
The challenges of replication become more complicated when more than two replicas are included in the same sync community or topology. Among these challenges are problems involving replacing more current data with outdated data based on the order devices are replicated, replicating data that may already be in sync, and having data that is in sync be reported as being in conflict.
As one example, consider a sync community that includes three replicas. A user updates replica 1 at time 1. At time 2, the same data is updated in replica 2. Replica 2 then replicates with replica 3 and the changes made in replica 2 are incorporated into replica 3. If replica 3 subsequently receives changes from replica 1, the data originally updated on replica 2 may be replaced with the original data from replica 1, even though the change from replica 1 is not the most recent change.
Moreover, communication resources can be inefficiently allocated if replicas incorrectly believe that their information is out of sync, and hence perform unnecessary sync operations. In the three replica sync community example above, if a user updates replica 1, such changes can then replicated to replica 2. Replica 2 can then replicates its changes to replica 3, wherein information from replica 2 (which is currently also the information from replica 1) is changed on replica 3. Likewise, replica 3 can then replicate with replica 1. In some cases, replica 3 may know that replica 1 has been updated—yet not know the version of information on replica 1. As such, replica 3 may replicate its information to replica 1, even though the same information is already on replica 1. Further, additional needless replications may continue as replica 1 replicates with replica 2 or performs other pair-wise replications at subsequent times.
Other replication challenges involve replicated data that actually appear as being in conflict, even when no actual conflict exists. In the example given above, initially information on replica 1 can be updated and replicated to replica 2. Subsequently, the information on replica 1 can then be replicated to replica 3. Replicas 2 and 3 then attempt a replication only to discover that they each have changes (from the replication with replica 1) that have occurred since their last replication. Even though the changes are the same, nonetheless replicas 2 and 3 may conclude that a conflict exists.
Another set of problems relate to partial replication of data in a data store at a particular time. For example, the data store can include email messages in various folders such as an inbox folder or other folders including folders that contain saved email messages. In some cases, a user desires to replicate changes to all of the email folders (e.g., when the communications bandwidth between replicating devices is large), while in cases of more limited bandwidth replication is only required for particular folders, such as an inbox.
In another example, a user can synchronize only part of their entire set of data in all cases. For instance, a user may desire to maintain all email on a desktop computer or server, but only synchronize their inbox and a selected set of folders to a small device that has limited storage. In such case, some information may never be synchronized with a particular device.
In another example, a data store can include digital music files wherein users can synchronize entire digital music libraries with a portable music player or computer with a large hard drive. Such users may also desire to employ a small portable music player with a limited amount of flash memory, on which they only want to store a selected set of music. In one example, such music to be synchronized can further include digital music files with predetermined qualities (e.g., rated with “four stars” or “five stars,” or downloaded in a particular time frame.)
In addition, when synchronizing a particular set of data various additional problems can arise. For example, data may fit the criteria of a filter and be in a desired set of data at one time or on one device, yet not fit such criteria (and hence not be in the desired set of data) at another time or on another device. Additionally, each replica may need to continue to maintain an understanding of the data it has synchronized from different devices, even when that data may, for example, be a subset of the full set of data during some synchronizations, and the full set of data during other synchronizations.