In computer networks, databases are often replicated on multiple computers to provide better access to them. A user may have a local replica of a database (e.g. mail file). The local replica of a mail file, for example, regularly replicates with a server computer to pull in new messages, and to send updates to the server as the user processes their messages (e.g. deletes messages, files messages in various folders, sends new messages, etc). Following replication, typically a replication history is updated so that a replication application can determine where to pick up on the next replication. The replication history may be stored at the client computer, the server, or on both. Similarly, A replication application to perform the replication actions may be stored at the client computer, the server, or on both.
Typically, the replication history will contain the other computer's identity, as well as a time stamp representing the time of the last successful replication between the client computer and the other (server) computer. When a new replication is triggered between the client computer and the other computer, the replication application replicates changes from the time of the timestamp in the replication history. However, if a new server with a new server replica is added, or if replication fails over to a server replica that the client (or local) replica has not previously replicated with, then the client (or local) replica and the new server replica must perform a full (from time 0) replication. This full replication can be very time consuming and can be a CPU, network, and I/O intensive operation. Similarly, if replication fails over to a server that the local replica has not replicated with recently, a potentially long replication may result. Note that, even if the new server is completely up to date (through replication with the other server), it must engage in long replication to determine if the local and server replicas are in sync.
This problem is especially pronounced in cloud computing when a disaster recovery site is involved. In this scenario, the local replica replicates with the primary (active) site. Months may elapse with the local replica and the primary site replicating regularly. Then, one day, because of an actual disaster, or because of a planned site flip, the user's computer may be connected to the former disaster recovery site. Since the local replica has never replicated with the disaster recovery site, or in the case of a planned site flip, may not have replicated with the disaster recovery site in several months, a lengthy replication will result. In a cloud system, thousands of local replicas may be involved in a site flip, placing a huge load on the local and cloud systems.