It is critically important for storage systems to satisfy the demanding requirements of high reliability and availability. It has been a big challenge for conventional hard disk drive (HDD)-based redundant array of independent disks (RAIDs) for a long time, and this problem becomes even more challenging for solid-state disk (SSD)-based RAIDs because flash cells have limited erasure cycles (e.g., typically TLC flash cells only have about 1,000 erasure cycles). In addition to complete drive failures, SSDs also suffer from partial failures including read disturb errors, write errors, and retention errors. Therefore, it is desirable to further enhance system reliability and availability of HDD-based or SSD-based RAIDs beyond conventional parity-encoding schemes.
Additionally, HDDs and SSDs under read and write workloads also always suffer from high latencies, due to internal processes like firmware bugs, transient errors, garbage collection, wear leveling, internal metadata persistence, etc. Because RAID-based storage systems, especially those with SSDs, are expected to provide low latency, it is important to overcome this limitation and consistently provide low latency for user I/O.