The present invention relates to a method of replicating data managed by a data fabric communication network that interconnects the nodes of a distributed computer system and, more particularly, to a method for preventing pauses by algorithms affected by data modifications during the data replication process.
A data fabric is a communication network that interconnects a plurality of distributed computation nodes of a computer system. The distributed computing nodes may be performing a plurality of processes and the data fabric enables the nodes to exchange data and use the data in the performance of the process(es) executing on the local node. The data fabric provides a data infrastructure that distributes and replicates data enabling data to be stored in a distributed memory so that the data may utilized at high rates with low latency and to be frequently updated by a plurality of processes being executed by one or more of the distributed computing nodes of the system.
Distributed data caching is a central feature of a data fabric network, such as the GemFire Enterprise® data fabric from Gemstone Systems Inc. A cache provides temporary storage for data obtained from a data source enabling subsequent local use of the data without the necessity of repeatedly downloading the data from the data source. For example, a data cache may be used to temporarily store, at a local computer, data that is downloaded from an Internet web site. Latency in the use of the data is substantially reduced by the using the data in the local cache rather than downloading the data from a remote source for each use. The replication of data also provides redundant data storage for the system. If a process holding a replica of data fails, the data can be made available from other replicas held by other processes of the system. The GemFire Enterprise data fabric provides data management enabling creation of a plurality of local data caches consistent with the other data sources of the system and the updating of a plurality of replicas of the data to reflect the changes resulting from the use of the data by the nodes of a distributed system.
The GemFire Enterprise data fabric comprises processes enabling data consistency among the various replicas of the data held by the system when a new replica of a data region, a portion of the system's data, is created. Messages communicating changes in the data of a data region are addressed to the various processes of the system holding replicas of the effected data region. When a new replica of the data is to be created, the GemFire Enterprise data fabric notifies the various processes utilizing the data to be replicated of the intention to create a new replica of the data region by copying one of the replicas of the data region held by one of the system's processes and directs the processes to forward any new changes to the data to a new group of processes that includes the process in which the new replica is to be created. The process in which the new replica is to be created stores any changes to the data that are received and, following creation of the new replica, the data of the new replica is updated. All of the processes utilizing the data of the replicated data region capture the changes to the data that were made after the intention to create the new replica is announced to the processes executing on the computing system.
Co-pending U.S. patent application, Ser. No. 11/982,563, incorporated herein by reference, discloses an innovative method of replicating system data which addresses the problem of “in-flight” changes to the data, that is, capturing a change to the data that was made by a process prior to receipt of the notice of intention to create a new replica but was not received by the data replica being copied before the data was replicated. In the innovative data replication method, data produced by operations occurring after the intention to replicate is announced are transmitted to all users of the data. The system monitors each of the communication channels connected to the data to be replicated and when each channel has stabilized, the data is replicated and then updated with the changes to the data resulting from operations occurring after the replication was announced. Capturing “in-flight” changes to the data promotes data consistency in a distributed system.
Since data may be modified by operations undertaken by one or more processes before the intent to replicate the data is announced, algorithms requiring knowledge of the values of data before and after modification typically must block or pause and wait for completion of the replication process and the updating of the replicated data. These algorithms may produce events that are of significance to an application and blocking may significantly interrupt or slow the execution of the application. What is desired, therefore, is a method of replicating data that enables algorithms relying on earlier and later versions of data to initiate operation before data replication is completed.