A Redundant Array of Independent Disks (RAID) combines a plurality of physical disk drives into a logical drive for purposes of reliability, capacity, or performance. Instead of multiple physical disk drives, an operating system sees the single logical drive. As is well known to those skilled in the art, there are many standard methods referred to as RAID levels for distributing data across the physical hard disk drives in a RAID system.
For example, a level 5 RAID system provides a high level of redundancy by striping both data and parity information across at least three hard disk drives. Data striping is combined with distributed parity to provide a recovery path in case of failure.
In RAID technology, strips of a drive can be used to store data. A strip is a range of logical block addresses (LBAs) written to a single disk drive in a parity RAID system. A RAID controller may divide incoming host writes into strips of writes across the member drives. A stripe is a set of corresponding strips on each member drive in the RAID volume. In an N-drive RAID 5 system (where N is three or greater), for example, each stripe contains N−1 data-strips and one parity strip. A parity strip may be the exclusive OR (XOR) of the data in the other strips in the stripe, and the drive that stores the parity for the stripe may be rotated per-stripe across the member drives. Parity may be used to restore data on a drive of the RAID system should the drive fail, become corrupted or lose power. Different algorithms may be used that, during a write operation to a stripe, calculate partial parity that is an intermediate value for determining parity.
A RAID write hole (RWH) event is a fault scenario that occurs when a power failure (or system crash) and a drive failure (e.g., strip read or drive crash) occur at the same time or close in time to each other. The RWH event is related to a parity-based RAID system. These system crashes and drive failures are often correlated events. These crashes and drive failures can lead to silent data corruption or irrecoverable data due to a lack of atomicity of write operations across member drives in a parity-based RAID system. Due to the lack of atomicity, the parity of an active stripe during a power failure may be incorrect in that the active stripe becomes inconsistent with the rest of the strip data across the active stripe. The data on such inconsistent stripes does not have the desired protection, and worse, can lead to incorrect corrections, which may create silent data errors within a parity-based RAID system.