Disk arrays are used to provide storage for computer applications that need reliability in the face of component failures, and high performance in normal use. The disks in the disk arrays are often arranged as redundant array of independent disks (RAID) to increase reliability. The array provides larger capacity, higher performance and, typically, higher availability for stored data than using disks individually. This is done by distributing the data across multiple disks and storing either a redundant copy of the data or enough parity information to regenerate the data if a disk or related component fails.
The existence of multiple replicas of the same data affects performance and reliability. For example, the most convenient/idlest/closest copy of data may be accessed for a read operation, but all copies will eventually have to be updated after a write. The two most widely used schemes for mapping client data onto a disk array are RAID 1/0 and RAID 5. Both rely on disk striping, where data is simultaneously read from or written to multiple disks. Disk striping utilizes stripe units (i.e., a fixed-size block) to store data in a single disk. A stripe unit may include a data unit or a parity unit depending on the RAID layout being used. A collection of related stripe units are called a stripe.
RAID 1/0 consists of striped mirroring, in which two copies of every data unit are kept on two or more disks, and RAID 5 keeps one parity unit per fixed number of data units (a set of data units and their corresponding parity units are a stripe), and parity units rotated among all disks. In RAID 1/0, a stripe is the set of stripe units that start at the same offset in all disks in a logical unit (LU), such as a mirrored pair. In RAID 5, a stripe is again the set of units that start at the same offset in each disk in an LU, but there are n−1 data units and a single parity unit for n disks. The composition of an LU in a RAID layout may vary depending on the RAID layout being used. Generally, an LU includes all the disks acting as a single virtual storage device.
RAID 1 and RAID 4 are also widely used. RAID 1 (like RAID 1/0) uses mirroring for storing redundant data, but does not use striping. RAID 4 (like RAID 5) uses parity information for storing redundant data, the difference being that a single disk contains all parity stripe units. In RAID 4, a stripe is again the set of units that start at the same offset in each disk in an LU.
In a standard RAID 1/0 implementation, two equal copies of the data are stored in each pair of disks (i.e., mirrored pair). Every time a read operation is performed, the array controller issues a read access to whichever of the two devices in the mirrored pair is likely to service the request sooner.
A disadvantage of RAID 1 and RAID 1/0 is that, in order to complete a write while tolerating the failure of any single disk, data is to be written to both a disk and to another disk (i.e., the mirror disk) storing the redundant data. Therefore, it is necessary to wait for both copies of data to be updated. Even though writes to the two corresponding disks are typically initiated in parallel by an array controller, the writes are rarely completed simultaneously. Each disk is processing accesses corresponding to other client requests in parallel, and moreover, being mechanical devices, the response time of a disk for a particular access depends on which access was serviced last (i.e., the positions of the mechanical components affect response times). Because of this, the average time spent waiting for two parallel disk accesses to complete is typically greater than the average time for a single disk access.
RAID 4 and RAID 5 suffer from more acute versions of the same problems. Both the stripe units being written and the corresponding parity units must be updated before the write can be considered complete. For example, if less than half of the units in a given stripe are being written (i.e., a “small write”), then the following is performed: the parity unit is read; its contents exclusive-OR'ed with the new values being written and with the old values they replace (which must therefore also be read); and the corresponding new values of the data and parity units are written back to disk. Therefore, the redundant data (i.e., the parity unit) is not only written, but also read to complete the operation.