The disclosure generally relates to the field of data storage systems, and more particularly to implementing hierarchical erasure coding in a wide spreading storage layout configuration.
Consumer enterprises collect and store increasingly large amounts of data. In many instances, data is stored and frequently archived even prior to any decision being made about whether and how to utilize the stored data. Although the per unit cost associated with storing data has declined over time, the total costs for storage has increased for many companies due to massively increasing data storage volumes. Hence, it is important for companies to find cost-effective ways to manage their data storage environments for storing and managing large quantities of data.
Traditional data protection mechanisms, e.g., RAID, are increasingly ineffective in petabyte-scale systems as a result of: larger drive capacities (without commensurate increases in throughput), larger deployment sizes (mean time between faults is reduced) and lower quality drives. The trend toward less expensive storage hardware is making traditional RAID increasingly difficult to implement reliably, requiring complex techniques, e.g., triple parity, declustering. Therefore, the traditional data protection mechanisms are ill-suited for the emerging capacity storage market needs.
In addition to RAID, data storage systems may implement erasure coding techniques to protect stored data. Erasure coding protection generally entails dividing stored data entities (e.g., data objects) into fragments and encoding to include redundant data. Having been expanded with the redundant data, the fragments may be stored across a set of different storage media locations. Such erasure coding techniques often impose substantial I/O processing for storage devices and network bandwidth consumption for reading or reconstructing data objects. The processing and network bandwidth costs for protection purposes together with providing client access subjects the storage devices to excessive wear. In order to maintain the same storage resiliency, the storage devices may have to be replaced with new ones regularly, which may substantially increase storage costs.