Some data replication methods replicate data from a read-write primary data storage server computing device (“server”) to read-only secondary servers. The primary server and the secondary servers can be located in different geographical regions. One of the characteristics of such a replication method can include a slow write operation if a client computing device (“client”) that is writing data to the primary server is located in a geographical region that is different from that of the primary server. Moreover, if the write operation is a synchronous write operation, the client can experience an additional delay, which is incurred in writing the data to the secondary servers. Typically, the client is unaware of why the write operations are slow, which can lead the clients to conclude that the application writing the data is faulty or slow. Increasing the number of secondary servers can increase data availability and/or reliability as a number of replicas of the data increases, but can further increase the delay. Accordingly, the above data replication method is not scalable.
Some data replication methods, e.g., peer-to-peer file sharing networks, are scalable but the delay involved in replicating data to a particular destination can be significant. In a typical peer-to-peer file sharing network, a data file is made available to multiple clients via one or more other peer clients (“peers”). One or more of the peers can act as a seed. Clients can download the file by connecting to seeds and/or other peers. The peer-to-peer file sharing network typically implements a segmented file transfer protocol in which the file being distributed is divided into one or more segments. As each peer receives a new segment of the file, it becomes a source (of that segment) for other peers, relieving the original seed from having to send that segment to every other client wishing a copy. In such file sharing systems, a segment may have to transit many nodes (“computing devices”) before a client can receive it. The number of nodes the segment has to be transit before the client can receive it is not fixed and therefore, the time taken to receive the data segment can vary significantly, and cannot be guaranteed. Also, the peer-to-peer file sharing networks are not network topology aware. That is, a first peer may connect to a second peer that is less proximate than a third peer, resulting in higher replication latency. Further, in the peer-to-peer technology each of the peers download the entire file from a seeding peer. The peer-to-peer technology may not make a peer available as a seed if the peer does not have or has not downloaded a copy of the complete file. For example, if a particular computer is connected to five different peers for downloading a file, each of the five peers download the entire file and store it on their disk, resulting in a significant consumption of storage resource of the peers, and can also increase the replication latency as read and/or write latency of a storage disk is significantly high.
Some replication methods that replicate data to a significant number of servers have a limitation on a size of data that can be replicated at a time and the size can be less compared to the amount of data needing to be replicated. This can result in replicating the data multiple times in portions, which increases the time required to replicate the data.