Nodes in a networked system often include data stores. Data from a source node may be copied to a target node using a process referred to as synchronization. Data stores in the source node (i.e., source data stores) typically are provisioned before those data stores are synchronized with data stores in the target node (i.e., target data stores). Provisioning is a process in which information (e.g., metadata) regarding a data store is gathered, so that the information may be used during synchronization of the data store.
A variety of techniques have been proposed for provisioning and synchronizing data stores. However, each such technique has its limitations. In one example, state based techniques have been developed that provide a fixed state for each item in a source data store. Accordingly, such techniques typically maintain a one-to-one mapping between items in the source data store and instances of metadata that correspond to those respective items, at least until an item is deleted from the source data store. For instance, if item(s) are deleted from the source data store, the corresponding instances of metadata remain in the database. Thus, it is possible for a number of instances of the metadata to exceed a number of the items in the source data store. Metadata that corresponds to items that have been deleted from a source data store are referred to as “tombstones”. Although state based techniques may support deletion of tombstones, such techniques often utilize a relatively great amount of data storage space, which increases as data is added to the source data store. Moreover, some functionalities such as “fast-init” employ metadata “fixup” for items in a source data store that are synchronized with a target data store. Metadata fixup is a process in which instances of metadata that correspond to respective items in a source data store are serially updated during provisioning of the source data store. Metadata fixup may consume a substantial amount of time and/or processing resources.
In another example, event based techniques have been developed that track events that occur with respect to the networked system to determine which items in a source data store are changed as a result of the events. For instance, a file based event technique may track calls to a CreateFile application programming interface (API). Each event leads to creation of metadata that tracks changes that occur with respect to the networked system. In a subsequent synchronization operation, a data structure (e.g., a queue) is examined for metadata created due to such events, and the data corresponding to the metadata is synchronized. In event based techniques, the amount of metadata is proportional to the number of events that occur with respect to the networked system. However in a net-changes synchronization scenario, in which the items in a source data store may be sent once, additional processing may be performed to optimize and compress events that correspond to the same set of data.