The present invention, in some embodiments thereof, relates to data management and recovery and, more specifically, but not exclusively, to management and recovery of distributed storage of replicas.
Storage systems often employ data replication to defend against disk failures. The data is partitioned into small contiguous units, and each unit is stored on more than one disk. If one replica happens to reside on a failing disk, the system may utilize another replica to recover.
Replication ensures consistency between redundant resources, such as software or hardware components, improves reliability, fault-tolerance, or accessibility. The replication may be data replication if the same data is stored on multiple storage drives. The replication process should be transparent to an external user. Besides consistency of replicas, the server nodes in a distributed system should be evenly loaded with replicas so that optimal application performance and network traffic is achieved.
A key parameter of replicating systems is the number of replicas to maintain for each data unit. This parameter reflects a tradeoff between safety and efficiency. Storing fewer replicas increases the risk of simultaneously losing all the replicas of a data unit due to several temporally adjacent disk failures. Conversely, storing additional replicas it reduces the effective storage size, as k replicas translate to 1/k usable space, induces higher network traffic, as more disks have to be synchronized upon data changes, and translates to greater energy consumption, because write operations induce additional disk/network activity, and because systems must utilize a higher number of disks to attain the same effective storage space.