For weakly mutable data, changes or mutations at one instance (or replica) of the data must ultimately replicate to all other instances of the database, but there is no strict time limit on when the updates must occur. This is an appropriate model for certain data that does not change often, particular when there are many instances of the database at locations distributed around the globe.
Replication of large quantities of data on a planetary scale can be both slow and inefficient. In particular, the long-haul network paths have limited bandwidth. In general, a single change to a large piece of data entails transmitting that large piece of data through the limited bandwidth of the network. Furthermore, the same large piece of data is transmitted to each of the database instances, which multiplies the bandwidth usage by the number of database instances.
In addition, network paths and data centers sometimes fail or become unavailable for periods of time (both unexpected outages as well as planned outages for upgrades, etc.). Generally, replicated systems do not handle such outages gracefully, often requiring manual intervention. When replication is based on a static network topology and certain links become unavailable or more limited, replication strategies based on the original static network may be inefficient or ineffective.
By definition, data stored within a distributed storage system are not at a single location but distributed across a geographical region or even the whole world. Therefore it is a challenge to design an optimized real-time data replication scheme within a large distributed storage system such that the scheme not only consumes as little resource as possible but also improves the services offered by the distributed storage system.