In certain computing environments, a first machine may make changes to a first instance of a data store utilized by the machine and then a second instance of the data store will be brought into synchronization with the first instance of the data store. For example, the second instance of the data store may comprise a backup copy of the data store that is kept in synchronization with the first instance of the data store so that it can be accessed as part of a recovery operation in case the first instance of the data store is lost. As another example, the data store may comprise user settings that a user wishes to apply to different instances of the same program running on different virtual and/or physical machines, such that a change to the user settings associated with one instance of the program running on one virtual and/or physical machine will be propagated to the user settings associated with other instances of the program running on other virtual and/or physical machines.
Synchronizing multiple instances of a data store typically requires choosing between using a “full dataset” approach and a “differential dataset” approach to perform the data store updates. As used in this context, the term “full dataset” refers to a dataset that provides a complete snapshot of the current state of each and every entity stored in a data store. While the full dataset approach has the benefit of being able to quickly update a new instance of a data store because all required state information is present in the dataset, the full dataset approach can be very inefficient when performing incremental updates in which only a few entities in an instance of a data store need to be modified. This is because, for incremental updates, the state of each entity in the instance of the data store must be compared to the state of each entity recorded in the snapshot to determine which entities actually require updating. As a result, systems that utilize the full dataset approach to perform frequent data store updates may suffer from performance problems.
As also used in this context, the term “differential dataset” refers to a dataset that includes only those state changes that have occurred since a previous differential dataset was generated for a data store. Using the differential dataset approach to update an instance of a data store involves applying in a defined order only those differential datasets that have been generated since the instance of the data store was last revised. Thus, the differential dataset approach can achieve more efficient updating of an instance of a data store than the full dataset approach. However, the differential dataset approach requires more complexity and overhead than the full dataset approach because it requires the management and ordered application of multiple differential datasets. For example, updating a new instance of a data store using the differential dataset approach requires applying all the differential datasets that have been created for the data store in the exact order in which such differential datasets were created to ensure synchronization.
Certain implementations that use either the full dataset approach or differential dataset approach as discussed above have required the use of a central server to host a version control system or database to manage the versioning of the data. Networked servers, Web services, cloud-based services and other centralized services have also been used.