Mass-scale data storage systems are desirable for their sheer size and ability to process and store vast amounts of data. Generally, these mass-scale systems are built to include a plurality of data centers located remotely from one another. Each data center is usually comprised of a plurality of independent clusters wherein each cluster has a plurality of nodes. These nodes are coupled to each other by way of a data network infrastructure and are ideally independent from each other insofar as the data is stored on storage media that are separately maintained.
The purpose for keeping the data centers at remote sites is to mitigate the risk of damage to multiple data centers in case of a catastrophic event (e.g., flood, earthquake, or tornado). In the event that a data center experiences the loss of a data object or a replica, the recovery process requires locating a surviving replica from which to replicate or restore the lost data. Accordingly, data centers are typically built with multiple local replicas per cluster. This feature allows a cluster to self-recover from a fault using the surviving local replicas.
If all local replicas are lost, the data may be recovered from one or more remotely stored replicas. Such remote recovery is currently accomplished via a manual process, often requiring administrative assistance. Due to the size and distribution of resources in a mass-scale data storage system, information regarding the location and availability of replicas across the plurality of data centers and clusters is not readily scalable and, in turn, is unavailable to all nodes across the plurality of data centers.