Enterprises and consumers today face the problem of storing and managing an ever-increasing amount of data on non-volatile data storage systems such as hard disk drives. One promising direction in computer storage systems is to harness the collective storage capacity of massive commodity computers to form a large distributed data storage system. When designing such distributed data storage system an important factor to consider is data reliability. Once data is stored a user typically does not want or cannot afford to lose any of the stored data. Unfortunately, the data management chain is prone to failures at various links that can result in permanent data loss or a temporary unavailability of the data. For example, any one of a number of individual components of a massive distributed data storage system may fail for a variety of reasons. Hard drive failures, computer motherboard failures, memory problems, network cable problems, loose connections (such as a loose hard drive cable, memory cable, or network cable), power supply problems, and so forth can occur leaving the data inaccessible.
For distributed data storage systems to be useful in practice, proper redundancy schemes must be implemented to provide high reliability, availability and survivability. One type of redundancy scheme is replication, whereby data is replicated two, three, or more times to different computers in the system. As long as any one of the replica is accessible, the data is available. Most distributed data storage systems use replication for simplified system design and low access overhead.
One problem, however, with the replication technique is that the cost of storing a duplication of data can become prohibitively expense. Large storage cost directly translates into high cost in hardware (hard drives and associated machines), as well as the cost to operate the storage system, which includes the power for the machine, cooling, and maintenance. For example, if the data is replicated three times then the associated costs of storing the data are tripled.
One way to decrease this storage cost is by using another type of redundancy scheme called erasure resilient coding (ERC). Erasure resilient coding enables lossless data recovery notwithstanding loss of information during storage or transmission. The basic idea of the ERC technique is to use certain mathematical transforms and map k original data blocks from an original data piece into n total data blocks, where n>k. The original data piece includes the k original data blocks and the n−k parity (or ERC) data blocks. When there are no more than n−k failures all original data can be retrieved using the inverse of the mathematical transforms. At retrieval time the n data blocks are retrieved to recover the original data piece. Currently, the main use of the ERC technique in distributed data storage systems is in the form of large peer-to-peer (P2P) systems.
A protection group is often used in ERC to provide an added measure of protection to the data. Typically, each of the n data blocks is placed in a single protection group. One problem, however, with using the ERC technique in distributed data storage systems is that because the data is fragmented and stored in a plurality of blocks multiple protection groups cannot be created. Another problem is that when a data block is modified each of the data blocks belonging to the same protection group must also be modified. In other words, whenever a data block is written or read then all the other data blocks in the protection group also must be modified.