The present embodiments relate to a new erasure coded storage system that adapts to workload changes by using two different erasure codes, including a fast code to optimize recovery cost of degraded reads and reconstruction of failed disks or nodes, and a compact code to provide low and bounded storage overhead. More specifically, the embodiments relates to a conversion mechanism to efficiently upcode and downcode data blocks between the two codes.
Distributed storage systems storing multiple petabytes of data are becoming common. These systems have to tolerate different failures arising from unreliable components, software glitches, machine reboots, and maintenance operations. To guarantee high reliability and availability despite these failures, data is replicated across multiple machines. For example, it is known in some systems to maintain three copies of each data block. Although disk storage is relatively inexpensive, replication of the entire data footprint is infeasible at massive scales of operation.
Many large scale distributed storage systems are transitioning to the use of erasure codes, which are known to provide high reliability at lower storage cost. These systems use a single erasure code, which either optimizes for recovery cost or storage overhead. However, for an erasure coded system, reconstructing an unavailable block requires fetching multiple data and parity blocks within the code stripe, which results in an increase in disk and network traffic. The increase in the amount of data to be read and transferred during recovery for an erasure-coded system results in high degraded read latency and longer reconstruction time.