The acronym RAID (originally redundant array of inexpensive disks, now also known as redundant array of independent disks) refers to a data storage scheme using multiple hard disks to share or replicate data among the disks. Depending on the level chosen, the benefit of RAID is that it increases one or more of data integrity, fault-tolerance, throughput or capacity, when compared to single disks.
There are various RAID configurations, which protect data against disk failure in two main ways. The first of these is mirroring, in which a whole disk is set aside to store a copy of the data on another disk. The second is the use of parity information.
RAID 5 is a method of storing data on disk arrays and involves striping data across the disks in the array. A RAID 5 system maintains parity information for the data and stores it in a stripe to provide data redundancy and to provide the ability to withstand the failure of one disk in the array. The parity information is calculated based on the data stored in the stripe. Every time the data is updated, the parity information also needs to be updated to keep it synchronised with the data.
The number of disks in the array is referred to as the stripe length or stripe width. The amount of data (including parity) that can be stored in a stripe is referred to as the stripe size. The portion of a disk that belongs to one stripe is referred to as a chunk. Each chunk is further divided into a number of logical block addresses (LBAs).
The number of disks in a stripe varies between disk arrays. The stripe width may also be less than the number of disks in the array. For example, the array may have 10 disks, with a stripe width of 5 disks.
The operation of reading or writing to a disk is referred to as an input/output (I/O) operation.
The Read-Modify-Write (RMW) method is one of the methods used in writing data to a RAID 5 system. When data is to be written to a chunk in a RAID 5 system, the RMW method updates the data to the appropriate chunk and also updates the parity chunk to reflect the change.
For example, for a RAID 5 array with 5 disks, a single stripe comprises four data chunks (D1, D2, D3 and D4) and one parity chunk (P). Writing new data (D1′) onto this stripe involves the following steps:    Read old data D1 and store in a temporary buffer;    Read old parity information P and store in a temporary buffer;    Calculate intermediate parity Pi=P⊕D1 and store it in a temporary buffer;    Calculate new parity information P′=Pi⊕D1′ and store it in a temporary buffer;    Write new parity information P′;    Write new data D1′,    where the symbol ⊕ represents an exclusive OR logical operation, also denoted herein by XOR.
Therefore, the RMW process for a single write requires 4 I/O operations (2 reads and 2 writes) and 2 parity calculations.
In general therefore, for a RAID 5 array having N disks, with a stripe width of N, a single stripe can accommodate a maximum of N−1 data write operations, where a stripe has N−1 data chunks and 1 parity chunk. To accomplish these writes, the RMW algorithm requires 4(N−1) I/O operations and 2(N−1) parity calculations.
The present RMW technique is therefore I/O intensive and is one of the main performance bottlenecks in a RAID 5 system.