A system may include a collection of computing devices, where a data item may be multiply replicated to create a number of copies of the item on the different computing devices and/or possibly within a single device. An item may be any stored data object, such as for example contact or calendar information, stored pictures or music files, software application programs, files or routines, etc. The collection of computing devices may for example be a desktop computer, a remote central server, a personal digital assistant (PDA), a cellular telephone, etc. The group of all such items and replicas where the items are stored may be referred to as a distributed collection.
Replication, or synchronization, of data is one process used to ensure that each data store has the same information. Synchronization protocols are used by devices that exchange created and updated versions of items in order to bring themselves into a mutually consistent state. The periodicity of the synchronization may vary greatly. Networked devices may synchronize with each other frequently, such as once every minute, hour, day, etc. Alternatively, devices may synchronize infrequently, such as for example where a portable computing device is remote and disconnected from a network for a longer period of time. Whether the synchronization is frequent or infrequent, the distributed collection is said to be weakly-consistent in that, in any given instant, devices may have differing views of the collection of items because items updated at one device may not yet be known to other devices.
Synchronization between replicas may be described as a sharing of knowledge between replicas. A common synchronization scheme involves tracking, within each replica, changes that have occurred to one or more items subsequent to a previous synchronization. One such tracking scheme makes use of version vectors, which consist of a list of version numbers, one per replica, where each version number is an increasing count of updates made to an item by a replica. Another synchronization scheme, implemented for example in the Sync Framework from Microsoft Corp., makes use of knowledge vectors. Unlike version vectors, knowledge vectors are associated with the replicas rather than the items. Each replica keeps a count of the updates it generates, and the knowledge vector of a replica consists of the version number of the latest update it learned from every other replica. In addition, each item at a replica has a single version number indicating the latest update applied to it.
While knowledge vectors work well for total replication between replicas, it may happen that one or more replicas are only interested in receiving a certain subset of information. This situation is referred to as partial replication. In order to allow for partial replication, a replica may contain a filter. A “filter” may be broadly defined as any construct that serves to identify a particular set of items in a data collection. These items are said to fall within the partial replica's “interest set.”
During synchronization, a target replica sends its knowledge, in the form of one or more knowledge vectors, to a source replica. The source replica then returns any versions of items stored in its local database that are not known to the target replica, i.e., items with versions that are not already included in the target's knowledge. The source also returns its own knowledge as learned knowledge in most cases. Synchronization protocols may rely on learned knowledge to help replicas maintain concise, defragmented knowledge.
During synchronizations, the learned knowledge sent from the source replica to the target replica may include versions of items that: (a) were known to the target prior to synchronization, (b) were sent during this synchronization session, (c) are not of interest to the target, i.e. do not match the target's filter, or (d) are obsolete, such as versions that causally precede any versions in category (a), (b), or (c). If the source is a partial replica whose filter does not dominate the target's filter, then the source's knowledge may include versions that do not match any of these four criteria for learned knowledge. Thus, the source replica may send learned knowledge about a subset of the items that it holds, which may cause the target replica to end up with multiple knowledge fragments, i.e. with knowledge vectors associated with different subsets of items, even if the target replica contained a single knowledge vector before synchronization occurred.