Erasure coding is a technique used to greatly reduce storage space required to safely store a dataset. For example, compared to three-way data replication that has an overhead of 200% and can survive two failures, a 10:4 Reed-Solomon erasure correction code (which divides the data into ten blocks and adds four parity blocks) has an overhead of 40% and can survive four failures. To maximize survivability, each of the replicas or different blocks of the erasure coded data are placed in different failure domains, where a failure domain at scale would be different racks or even different aisles within a data center. Typically, the distribution of replicas or blocks is implemented in a declustered configuration, in order that that the data on a given storage device can be protected by a large number of other storage devices.
To recover from a failure with simple replication, data from a surviving replica is read. In other words, the amount of data that must be read to recover from a storage device failure (the most common non-transient failure) is the amount of data that was on the failed device. At scale, where a failure domain is a rack, the amount of data that must cross the aggregation network switches between the racks is proportional to the data on the failed drive. By contrast, with k:r erasure coding, the amount of data that must be read and transferred over the aggregation switches is k times the amount of data on the failed device.
The description above is presented as a general overview of related art in this field and should not be construed as an admission that any of the information it contains constitutes prior art against the present patent application.