Today, software systems are used to manage large collections of data to make that data more easily and quickly available. To this end, software systems may replicate some or all of the data set being managed. Replication of the stored data set can improve availability of the data set, as well as fault tolerance. For example, a database management system may replicate a large data set across multiple locations, where each location provides storage for the local copy of the data set and support for processes that access and use the local data set copy. A user at such a location, typically referred to as a node, accesses its local copy of the data set to avoid the bottlenecks that appear when all users are accessing a single master copy, and thereby achieve high availability. Thus, reading data from the database can be done much more quickly when each node has a local copy. Moreover, in the event of a node failure, the data set of the failed node can be replaced or repaired by accessing a data set stored on another node.
The advantages of using replicated data sets come at the expense of increased system complexity. Although read operations make no changes to the data set, edits and deletions will change the stored data. Replication of a data set requires the system to synchronize duplicated copies of the data set so data integrity is maintained. Maintaining data integrity typically means that each user perceives a single logical data set instead of perceiving a system of multiple independent copies that contain different data.
To maintain data integrity across multiple nodes, the software system typically designates one data set to be the master copy, and designates the other nodes as copies of the master. As the distributed nodes operate on the different data sets, the operations are monitored by the node storing the master copy. In one system, the master node monitors the other nodes to log the changes the nodes propose to make to their respective local data sets. In this system, a mechanism synchronizes the data sets by coordinating the actions of the separate nodes. As the different nodes are independently making changes to their local data set, some mechanism is to be employed to synchronize the local data sets with the master data set. This synchronization mechanism may, for example, log the proposed changes, make the changes first to the master copy, then publish all the changes made to the master copy to the other nodes. The published updates are made by the other nodes to their local copies, and the nodes then confirm the updates by sending an acknowledgement to the master node. Typically, the master data set publishes the updates as the updates are made. The copies then make the changes as updates are published.
Although these systems can work well, relying on a master copy to control updates can create a bottleneck that slows overall system performance. To address this, some software systems allow multiple nodes to publish the changes made to their respective local data set. Each node responds to these published changes and coordinates the changes in a way that seeks to maintain the integrity of each local data set relative to the other data sets in the system.
Although such systems can provide improved performance, the asynchronous character of the data set updates published by multiple nodes can cause data integrity to suffer between data set copies. To address this, some systems employ a file locking process that locks local data set copies during update processes. This ensures that updates are consistent across copies. Although this lock process can work well, it can reduce data set availability.
As such, there is a need for systems that allow multiple data sets to maintain synchronization through processes that provide data integrity and high availability.