Today, applications increasingly store data or files on multiple computers. To guarantee data or file consistency, updates made on one computer must be synchronized in other computers. Typically, the original computer or computing node on which the data update occurs is called a source node, and the other computers or computing nodes to be synchronized are called the target nodes. A consistency guarantee is especially important where data may be redundantly stored in multiple nodes (e.g. cloud environment).
There are some solutions in the prior art to solve the data synchronization problem. According to one solution in the prior art, when a file in an original node is updated, the updated file is transmitted as a whole to other remote or local target nodes that need to synchronize the file. An obvious disadvantage of this solution is that small updates (even one byte updates) require the whole file to be transmitted, thus causing large and unnecessary consumption of time and network resources.
According to another solution in the prior art, on old copy of data or a file is divided into data blocks of a fixed size, and when the data or file is updated in an original node, a hash algorithm is used to identify these data blocks in a new copy of the data or file, and then the position information of the data blocks in the updated file and the contents of the updated data blocks are sent to a target node to perform data synchronization. This solution is not optimal because CPU resources are used for the hash computation, and excessive network resources are used in the transmission of the content and data blocks position information.
Therefore, a technique is needed for data synchronization between a source node and a target node by computing the difference between the new and old data or a file to avoid transmitting the whole contents of the file.