Enterprises increasingly have a need to store large amounts of data in data storage systems that include many storage devices (e.g., nodes and disk shelves) spread across data centers in numerous geographic locations (referred to herein as sites). Such data storage systems generally implement data protection scheme(s) to facilitate recovery or increased availability of data when physical component(s) of the systems fail or are otherwise down or unavailable. Exemplary data protection schemes include replication, redundant array of independent disks (RAID), dynamic disk pools (DDP), and erasure coding.
However, each of these schemes has advantages and disadvantages. For example, replication is the simplest of these schemes to implement but has a high storage overhead due to the storage of multiple copies of objects. RAID 5, RAID 6, and RAID-DP, for example, all allow protection against failure of one or more storage units (e.g., disks) with low storage overhead and some computation, but these schemes require significant effort to reconstruct failed disks and can leave a storage system vulnerable if additional failures occur while a rebuild is taking place. DDP distributes data, parity information, and spare capacity across a pool of drives. Its intelligent algorithm defines which drives are used for segment placement, ensuring full data protection, but providing slower retrieval times. Erasure coding refers to the use of a forward error correction (FEC) code to add redundant information to stored data in a way that spreads encoded fragments of data across multiple storage units. Most erasure codes either require high repair bandwidth to recover from component failures or additional storage overhead to allow localized repairs, but usually erasure coding requires less storage overhead than RAID protection schemes. Further comparing to RAID and DDP, erasure coding is more resilient to failures as it can also tolerate node, rack or data-center failures in addition to device failures.
Many current data storage systems use a single data protection scheme that attempts to match data protection needs at different levels in a hierarchy of components that comprise a data storage system. The result of using a single data protection scheme is excess storage overhead, unacceptable levels of repair load on the data storage system, and/or inability to support multiple failure types. Accordingly, some current data storage systems facilitate hierarchical data protection by implementing replication at the storage node level in combination with a RAID or DDP data protection scheme at the disk level. However, these data storage systems require full object copies and associated significant storage overhead in order to protect against storage node and site failures.