Various embodiments of the present invention relate to data replication, and more specifically, to a method and apparatus for generating an initial copy in replication initialization.
With the development of data storage technology and network communication technology, the concept of distributed data storage has been put forward so far. In distributed data storage, data is no longer located in one single data node but may be distributed across a plurality of data nodes at the same or different physical locations. Further, in order to provide more reliable data storage, a plurality of data copies of a data object may be stored in a plurality of data nodes in a distributed data storage system, so that when parts of data nodes fail, data in faulty nodes may be recovered on the basis of copies in other non-faulty data nodes.
Nowadays, the data amount in a database gets increasingly large with the increase of user demands. It might take several hours and even several days to replicate data among various data nodes in distributed data storage (for example, replicating data from a data node in Beijing to a data node in Shanghai). Although the efficiency of data replication may be increased by increasing the bandwidth between various data nodes, it takes huge overheads of manpower and material resources to increase the data bandwidth. In addition, since the data transmission amount between various data nodes is not stable, a large waste of bandwidth resources will be caused if the network transmission bandwidth is set on the basis of peak demand for the data transmission amount.
Replication initialization refers to a step of synchronizing data between different data nodes for the first time, during which all data in a source node needs to be copied to a target node. In the field of data replication, usually a large data transmission amount is required during replication initialization, while where there is an initial copy in the target node, only a small amount of data transmission can ensure the synchronization between the target node and the source node. Therefore, it becomes a research hotspot in the data replication field regarding how to increase the efficiency of initialization and rapidly generate an initial copy during replication initialization.