There are a variety of distributed data systems that have devices and objects that share data with one another. For instance, music sharing systems may synchronize music between a PC, a Cell phone, a gaming console and an MP3 player. Email data may be synchronized among a work server, a client PC, and a portable email device. Today, to the extent such devices synchronize according to common information, the synchronization takes place according to a static setup among the devices. However, when these devices are loosely coupled such that they may become disconnected from communications with each other, e.g., when a Cell phone is in a tunnel, or when the number of devices to be synchronized is dynamic, it is desirable to have a way for the devices to determine what changes each other device needs when they re-connect to one another, or as they join the network.
Today, as shown in FIG. 1, there are various examples where a master node 100 synchronizes in a dedicated manner with a client node 110, such as when an email server synchronizes with an email client. Due to the dedicated synchronization between the two devices, the state of the necessary knowledge 102 to synchronize between the two devices can be tracked by the master node 100. Such knowledge 102 can also optionally be tracked by client node 100 as well, however, when the connection between master node 100 and client node 110 becomes disconnected at times, and when the number of synchronizing devices increases, tracking the necessary knowledge across all of those devices and representing it efficiently at the same time becomes a difficult problem.
In addition to being inefficient and inflexible, another problem with current solutions is that they often base their synchronization semantics solely on clocks or logical watermarks for a specific node (e.g., the email server), as opposed to any node. These systems can work well in cases of a single connecting node or master. However, they run into problems when the topology or pattern in which the nodes connect changes unpredictably.
Thus, a need for node-independent synchronization knowledge arises when computers in a topology change the way they connect to each other or as the number of computers grows. For instance, with a media player, it might be desirable to synchronize among multiple computers and multiple websites. In most instances, most applications can only synchronize data between a few well-known endpoints (home pc and media player). As the device community evolves over time for a user of the media player application, however, the need for data synchronization flexibility for the music library utilized by the devices increases, thereby creating the need for a more robust system.
Thus, loosely connected systems of device nodes need an efficient way to describe the data they have, where they received the data and what data they need from another node involved in the conversation. Any distributed data system that wishes to share common information across multiple loosely coupled devices could thus benefit from a way to represent what changes to the common information of which they are aware and what changes of which they are unaware.
In this regard, complications arise when attempting to synchronize among loosely coupled devices when there is no mechanism for understanding the collective knowledge of the set of devices that are connected. Compounding the problem of how to represent knowledge efficiently in a synchronization framework is the problem of how to synchronize and represent only a subset of information known by other device(s). For instance, this might happen where a device or application is not capable of storing or using the same types, formats, or amounts of data that a second device stores or uses, i.e., different endpoints can have different capabilities. For instance, a first device might be a personal computer (PC) with lots of storage, whereas a handheld device that synchronizes with the PC may have limited storage. In such case, the handheld device may only receive a subset of the files from the PC, e.g., only those files on the PC that are 50 Kb or less. How to represent on the handheld device in a loosely coupled multi-master synchronization environment that the handheld device received a subset of knowledge from the PC is a challenge.
This subset synchronization scenario can also manifest for different devices that have identical or similar capabilities, but where the different devices nonetheless maintain different schema for representing data elements to be synchronized. For instance, a first device might store music files with a rich set of metadata, such as title, artist, album, size of file, length in time, rating, format, digital rights management, etc., whereas a second device, though having the same rendering and memory capabilities, may include a different application that only stores title, artist and album. In this case, the second device indeed would only carry over a subset of information to its data store when synchronizing with the first device.
The same subset synchronization complication also applies when the data to be synchronized from the first device to the second device is not strictly a subset of the data on the first device. For instance, in the case of overlapping, but different sets of schema elements maintained by each device, even though the first device has schema elements not represented by the second device, and the second device has schema elements not represented by the first device, there is a common overlapping set of schema elements represented by both devices. Thus, the two devices can still benefit by sharing what they know about each other's common or overlapping set of data, in which case each device is really sharing a subset of its own data with the other device. Today, however, for loosely coupled devices in a multi-master system, there is no efficient and flexible way to represent this partial knowledge share as such.
Instead of representing partial knowledge synchronization as such, today, conventional systems select the lowest common denominator among devices in terms of their synchronization capabilities. Thus, if a first device can represent data of types A, B, C, D, E and F, a second device can represent data of types A, G, H, I and J and a third device can represent data of types A, K, L, M and N, then the least common denominator of data types supported among the three devices is type A only. In such case, only type A would be synchronized in conventional synchronization systems.
Moreover, while the data of type A can be synchronized among the devices as the lowest common denominator, today, there is also no dialog among the devices beyond the synchronized data itself that indicates it was a partial knowledge share. In essence, tracking how knowledge evolves in a multi-master synchronization system where devices come and go, connect and disconnect, and tracking how subsets of data are exchanged among the devices in such a system is a difficult and challenging problem thus far unaddressed by those in the synchronization field.
Still further, other conventional systems in essence ignore the problem by allowing the full set of knowledge on each device to synchronize to each of the other devices. Where a device does not recognize the data that was synchronized to its data store, the device marks the data as unrecognizable. While this allows a third device to synchronize with the unrecognizable data on the second device, potentially giving the third device an opportunity to recognize some or all of the unrecognizable data, the proliferation of unrecognizable data on devices with limited storage is unworkable as a practical matter. More generally, storing all of the data in this fashion achieves nothing more than a backup system where each device backs up its data to all other devices of a network, an incredibly inefficient scheme to say the least.
Accordingly, flexible and efficient ways to represent knowledge transfers from one device to another device are desired for a variety of loosely coupled devices, where the device transfers only a subset of its knowledge to the other device. Additional detail about these and other deficiencies in the current state of synchronization among loosely coupled devices, and with respect to synchronizing only a subset of data among the devices, may become apparent from the description of the various embodiments of the invention that follows.