Storage solutions using clustered data storage node, or “bricks”, connected with a Local Area Network (LAN) are becoming an increasingly attractive alternative to generally more expensive Storage Area Network (SAN) solutions. A brick is essentially a stripped down computing device such as a personal computer (PC) with a processor, memory, network card, and a large disk for data storage. For these systems, providing strong data reliability is confronted with new challenges. One reason for this is because inexpensive commodity disks are typically more prone to permanent failures. Additionally, disk failures are far more frequent in large systems. To guard against permanent loss of data, replication is often employed. The theory is that if one or more replicas are lost due to disk failures, other replicas will still be available for use to regenerate new replicas and maintain the same level of reliability.
Replica placement refers to a strategy of placing replicas among participating bricks. Two widely used replica placement schemes are staggered sequential placement like in chained de-clustering, and totally random placement. Mirroring can be viewed as a degenerated special case of sequential placement. Replica placement can significantly affect the reliability of a system due to two factors. The first factor is repair speed. The greater the number of bricks 110 used to participate in a data repair process (subject to the available network bandwidth), the sooner that the reliability level will return. The second factor is sensitivity to multiple and concurrent failures. The greater the number of permutation choices that data placement generates, the more likely a random failure of several bricks 110 will wipe out one or more portions of the data permanently. These two factors are conflicting in nature. For instance, the random placement has very fast repair speed, but is prone to concurrent failures, whereas the sequential placement is precisely the opposite.