There are a variety of distributed data systems that have nodes (e.g., computers or devices) that share data with one another. For instance, music sharing systems may synchronize music between a PC, a Cell phone, a gaming console and an MP3 player. For another example, email data may be synchronized between an e-mail server and e-mail clients on PCs or other devices. Conventionally, to the extent such devices synchronize according to common information, the synchronization takes place according to a static setup among the devices.
Because of limited storage availability on different devices or computers, as well as limited network bandwidth availability between them, synchronizing a subset of the data is essential in many scenarios. Correct functionality in such scenarios requires every node maintain the most recent copy of all the desired data. At the same time synchronization of such subsets needs to be performed efficiently. A common problem with existing solutions is synchronization metadata grows proportional to the number of items in the subset, rather than proportional to the number of nodes involved in synchronization.
Another problem is how to synchronize and represent only a subset of information known by other device(s). For instance, this might happen where a device or application is not storing the same data that a second device stores or uses, e.g., different endpoints can store different subsets. For instance, a first device might be a personal computer (PC) with lots of storage and stores all music items on behalf of a user, whereas a handheld device that synchronizes with the PC may have limited storage in which case only music items rated with 5 stars by the user are stored. In such case, the handheld device may only receive a subset of the files from the PC, e.g., only those files on the PC that are rated with 5 stars. How to represent on the handheld device in a loosely coupled multi-master synchronization environment that the handheld device received a subset of data from the PC is a challenge. Compounding the challenge is when multiple devices are synchronizing. For instance, in addition to the handheld device and the PC, a user's laptop may store all music items with 3 stars or greater.
Conventional synchronization systems suffer from either or both of 1) non-convergence due to not handling items that move out of the filter, which leads to unacceptable results or 2) metadata growth is proportional to the number of items in the subset due to sending and storing metadata for all items including those items that have never been in the subset, which leads to unscalable systems and/or difficulties for devices with limited storage or processing capabilities as well as leading to incorrect data, incorrect behavior, or higher storage requirements. Thus, conventionally, for loosely coupled devices in a multi-master system, there is no efficient and flexible way to represent synchronization metadata for the subset of the data that is of interest. In essence, tracking how items move in and out of the subset, and how such subsets of data are exchanged among the devices in such a system is a difficult and challenging problem thus far unaddressed by those in the synchronization field.
Still further, other conventional systems in essence ignore the problem by allowing the full set of synchronization metadata (e.g., knowledge) on each device to synchronize to each of the other devices. Where a device does not recognize the data that was synchronized to its data store, the device marks the data as unrecognizable. While this allows a third device to synchronize with the unrecognizable data on the second device, potentially giving the third device an opportunity to recognize some or all of the unrecognizable data, the proliferation of unrecognizable data on devices with limited storage is unworkable as a practical matter. More generally, storing all of the data in this fashion achieves nothing more than a backup system where each device backs up its data to all other devices of a network, an inefficient scheme to say the least.
In general, synchronization or replication refers to the act of keeping multiple copies of data at different replicas the same, as well as detecting and reporting conflicts for concurrent changes to same data on different replicas. Replicas can reside on different locations such as a computer, device, or cloud. Synchronization systems have to deal with changes happening on different replicas, efficiently replicating them while detecting conflicts to ensure there is no data loss.
Filtering refers to synchronizing a subset of the data. There are different types of filtering related to synchronization, and arbitrary filters where data can move in and out of the filters currently has no known efficient solutions, with examples of such difficult synchronization scenarios as follows.
As mentioned, a first example is synchronizing music albums, and tracks between a PC and device where device only keeps music that user rates as “5 stars” and where another device keeps music that a user rates as 3 or more stars. Since the user can change ratings of music, this causes data to move in and out of the filter. This relatively straightforward example involving just three devices is illustrated in FIG. 1. PC 100 stores all music 102. Handheld device 110 stores music rated with 5 stars 112 and laptop 120 stores music rated with 3 or more stars 122. Since the set membership of music 112 and 122 can change a lot as a user adds or deletes music, or re-rates existing music, the synchronization problem is not straightforward to handle.
Another example is synchronizing customer data between a relational database that keeps all customer records including those applicable to a given sales person. On the sales person's laptop client, the sales person only wants to keep data for customers with which that sales person works, e.g., those customers with state=‘WA’. Since customer addresses can change to states other than ‘WA’, data can move in and out of the filter applicable to the laptop or other client. This example illustrates the sheer complexity of the number of devices that can be involved since a sales force of sales people may include hundreds of people and thousands of devices possessed by such people, and thus a general and scalable mechanism for resolving how to keep track of what information each device knows, track and can therefore receive as part of synchronization processes is desirable.
In this regard, existing filtering solutions have the drawback of either move filters that take the step of exchanging filter membership (e.g., all items are in the filter) or sending updates to all items (e.g., including those not in the filter). For instance, based on the example above with a database filter of customers in state=‘WA’, synchronizing changes to all data (e.g., customers in all other states) is extremely inefficient and undesirable.
Accordingly, flexible and efficient ways to represent synchronization metadata transfers of data from one device to another device are desired for a variety of loosely coupled devices, where the device transfers a subset of its knowledge to the other device and where set membership can vary. Additional detail about these and other deficiencies in the current state of synchronization among loosely coupled devices, and with respect to synchronizing subsets of data among the devices, may become apparent from the description of the various embodiments of the detailed description that follows.