There are a variety of distributed data systems that have devices and objects that share data with one another. For instance, music sharing systems may synchronize music between a PC, a cell phone, a gaming console and an MP3 player. For instance, email data may be synchronized among a work server, a client PC, and a portable email device. Today, to the extent such devices synchronize to maintain common information wherever changes take place, the synchronization takes place according to a static setup among the devices. However, when these devices are loosely coupled such that they may become disconnected from communications with each other, e.g., when a cell phone is in a tunnel, or when the number of devices to be synchronized is dynamic, it is desirable to have a way for the devices to determine what changes each other device needs when they re-connect to one another, or as they join the network. Moreover, there is a need to determine what conflicts or ambiguities may exist with respect to what data to propagate or replicate to other devices, such as when two different devices independently make changes to respective copies of the same data.
Today, as shown in FIG. 1, there are various examples where a master node 100 synchronizes in a dedicated manner with a client node 110, such as when an email server synchronizes with a dedicated email client. Due to the dedicated synchronization between the two devices, the state of the necessary knowledge 102 to synchronize between the two devices can be tracked by the master node 100. Such knowledge 102 can also optionally be tracked by client node 100 as well. However, when the number of synchronizing devices increases and when the connection between master node 100 and a client node 110 may become disconnected at times, not only does tracking the necessary knowledge across all of those devices become a difficult problem, but the number of conflicts from a synchronization standpoint proliferate as well. This is because the opportunity for different devices to evolve a set of data being synchronized independently increases when the devices increase in number and when they can become easily disconnected.
A problem with current solutions is that they often base their synchronization semantics solely on clocks or logical watermarks for a specific node (e.g., the email server), as opposed to any node. These systems can work well in cases of a single connecting node or master. However, these systems are problematic when the topology or pattern in which the nodes connect changes unpredictably. Moreover, as the situations and circumstances under which a complex set of devices may wish to synchronize data in a loosely coupled network increase, there is an even greater need for flexibility and control over the way that those devices handle conflicts.
With respect to the proliferation of conflicts in a multi-master synchronization scenario, a need for node-independent synchronization knowledge and conflict handling according to a variety of resolution measures arises when computers in a topology can change the way they connect to each other or as the number of computers grows. For instance, with a media player, it might be desirable to synchronize among multiple computers and multiple websites. In most instances, most applications can only synchronize data between a few well-known endpoints (e.g., home PC and media player), in which case a static conflict resolution measure is enforced, e.g., “home PC always wins conflicts.” As the device community evolves over time for a user of the media player application, however, the need for data synchronization flexibility for the music library utilized by the devices increases, as does the flexibility with which the devices handle conflicts when synchronizing with one another in various orders.
Thus, any distributed data system that wishes to share common information across multiple loosely coupled devices needs an efficient way to represent what changes to the common information of which they are aware and what changes of which they are unaware, and needs a way to resolve conflicts for such changes when they conflict with one another. For a conceptual illustration of the problem, imagine four friends who each go see a sneak preview of an upcoming movie. Unfortunately, the movie studio has decided to limit distribution of the movie and each friend is limited to seeing only a thirty-minute segment of the movie. When the friends get back together, they have a meeting where each describes the beginning through the end of the segment they watched to attempt to collectively piece together as much of the movie as possible.
If, by chance however, the fourth friend cannot attend the meeting, then the one of the first three friends, e.g., the second friend, who talks to the fourth friend next will attempt to add the collective knowledge of the movie by the first three friends to the knowledge of the movie by the fourth friend. At that time, however, the complete set of knowledge of the movie as between the four friends is understood only by the second and fourth friends. Then, when either of the first friend or third friend encounters either of the second or fourth friend, the first or the third friend will gain the collective knowledge of the movie as well. Synchronization is finally complete when each of the four friends understands the collective knowledge of the movie by the four friends.
However, to show the opportunity for conflict, suppose that the first friend, prior to encountering either the second or fourth friend, talked to a fifth friend, who gave an account of some missing pieces from the movie that differed from what the second or fourth friend later tells the first friend. The first friend will not know which account of the movie to take as the true version of what happened. Often times, the first friend will perform some sort of heuristic guess as to which is the best account. The first friend might take into account length of relationship, history of trust with one friend or another, or other like factors when considering which story to adopt, and which to discard. In other cases, the first friend might remember both accounts for a short while and wait for additional information prior to resolving the conflict. In a similar fashion, it would thus also be desirable to allow devices to synchronize with one another and resolve conflicts as they arise among distributed devices synchronizing data in a loosely coupled system.
In the above example, the movie is analogous to common information to be shared across devices and the friends are analogous to the loosely coupled devices. In this regard, when the friends/devices come back together, what is needed is a mechanism for representing what each of the connected individuals/devices know and do not know, and for resolving conflicts among such knowledge, i.e., for determining “true” knowledge, so that the common information can be pieced together to the maximum extent permitted by the collective knowledge of the individuals/devices. Loosely connected systems of device nodes thus need an efficient way to describe the data they have, where they received and what data they need from another node involved in the conversation, and how to resolve conflicts among the devices.
In short, conflicts are an inevitable problem that arises with 2-way multi-master sync topologies. Users or applications are free to make concurrent modifications to the same item on different endpoints leaving no way for a synchronization solution to be able to determine which change(s) to correctly accept. As discussed, existing conflict resolution policies allow for the automatic resolution of conflicts through the application of some pre-determined policy, such as “last writer wins.” However, given the proliferation of different end-point types, a single pre-determined policy is not sufficient to address the myriad of conflict resolution policies that have been identified for a corresponding number of evolving device synchronization scenarios among loosely coupled devices. In this regard, the challenge that is not addressed adequately today is the balancing act of making synchronization applications flexible enough to implement different conflict policies while at the same time making them robust enough to store, apply and rollback these conflicts in a deferred or automated fashion.
Current solutions fall short in regards to flexibility and/or robustness. First, many solutions offer only a handful of popular conflict resolution policies such as last writer wins. This lack of flexibility will quickly become unacceptable as different end-points become popular and users need to synchronize non-traditional types of data. In addition, many applications do not support the ability to apply conflict resolution policies or do not maintain enough conflict meta-data to roll back/forward the changes associated with conflicts in an automated or UI driven fashion.
In this regard, complications arise when attempting to synchronize among loosely coupled devices when there is no mechanism for understanding the collective knowledge of the set of devices, determining the conflicts in such knowledge, and resolving those conflicts according to flexible policies for devices that become connected. Additional detail about these and other deficiencies in the current state of synchronization among loosely coupled devices may become apparent from the description of the various embodiments of the invention that follows.