A tremendous amount of data has been created in the past few years. Some studies show that 90% of world's data was created in the last two years. Not only are we generating huge amounts of data, but the pace at which the data is being created is also increasing rapidly. Along with this increase, there is also the user expectation of high availability of the data, in the face of occurrence of failures of disks or disk blocks. Replication is a commonly used technique to provide reliability of the stored data. However, replication makes data storage even more expensive because it increases the cost of raw storage by a factor equal to the replication count. For example, many practical storage systems (e.g., Hadoop Distributed File System (HDFS), Ceph, Swift, etc.) maintain three copies of the data, which increases the raw storage cost by a factor of three.
In recent years, erasure codes (EC) have gained favor and increasing adoption as an alternative to data replication because they incur significantly less storage overhead, while maintaining equal (or better) reliability. In a (k, m) Reed-Solomon (RS) code, the most widely used EC scheme, a given set of k data blocks, called chunks, are encoded into (k+m) chunks. The total set of chunks comprises a stripe. The coding is done such that any k out of (k+m) chunks are sufficient to recreate the original data. For example, in RS (4, 2) code, 4 MB of user data is divided into four 1 MB blocks. Then, two additional 1 MB parity blocks are created to provide redundancy. In case of a triple replicated system, all four 1 MB blocks are replicated three times. Thus, an RS (4, 2) coded system requires 1.5x bytes of raw storage to store x bytes of data and it can tolerate up to 2 data block failures. On the other hand, a triple replication system needs 3x bytes of raw storage and can tolerate the same number of simultaneous failures.
Although attractive in terms of reliability and storage overhead, a major drawback of erasure codes is the expensive repair or reconstruction process—when an encoded chunk (say c bytes) is lost because of a disk or server failure, in a (k, m) code system, k×c bytes of data need to be retrieved from k servers to recover the lost data. The term “server” refers to the machine that stores the replicated or erasure-encoded data or parity chunks. In the triple replicated system, on the other hand, since each chunk of c bytes is replicated three times, the loss of a chunk can be recovered by copying only c bytes of data from any one of the remaining replicas. This k-factor increase in network traffic causes reconstruction to be very slow, which is a critical concern for production data centers of reasonable size, where disk, server or network failures happen quite regularly, thereby necessitating frequent data reconstructions. In addition, long reconstruction time degrades performance for normal read operations that attempts to read the erasured data. During the long reconstruction time window, the probability of further data loss increases, thereby increasing the susceptibility to a permanent data loss. An erasure refers to loss, corruption, and unavailability of data or parity chunks.
While it is important to reduce repair traffic, practical storage systems also need to maintain a given level of data reliability and storage overhead. Using erasure codes that incur low repair traffic at the expense of increased storage overhead and inferior data reliability is therefore usually a non-starter. However, reducing repair traffic without negatively impacting storage overhead and data reliability is a challenging task. It has been shown theoretically that there exists a fundamental tradeoff among data reliability, storage overhead, volume of repair traffic, and repair degree. Dimakis et al. (Network coding for distributed storage systems. IEEE Transactions on Information Theory, 2010) provide a mathematical formulation of an optimal tradeoff curve that answers the following question—for a given level of data reliability (i.e., a given (k, m) erasure coding scheme), what is the minimum repair traffic that is feasible while maintaining a given level of storage overhead? At one end of this optimal curve lies a family of erasure codes called Minimum Storage Codes that require minimum storage overhead, but incur high repair bandwidth. At another end of the spectrum lies a set of erasure codes called Minimum Bandwidth Codes that require optimal repair traffic, but incur high storage overhead and repair degree. Existing works fall at different points of this optimal tradeoff curve. For example, RS codes, popular in many practical storage systems, require minimum storage space, but create large repair traffic. Locally repairable codes require less repair traffic, but add extra parity chunks, thereby increasing the storage overhead.