1. Field of the Invention
The disclosed invention relates to architectures for arrays of disk drives, and more particularly, to disk array architectures that provide two-drive fault tolerance.
2. Description of the Related Art
A Redundant Array of Independent Disks (RAID) is a storage technology wherein a collection of multiple disk drives is organized into a disk array managed by a common array controller. The array controller presents the array to the user as one or more virtual disks. Disk arrays are the framework to which RAID functionality is added in functional levels to produce cost-effective, highly available, high-performance disk systems.
RAID level 0 is a performance-oriented striped data mapping technique. Uniformly sized blocks of storage are assigned in a regular sequence to all of the disks in the array. RAID 0 provides high I/O performance at low cost. Reliability of a RAID 0 system is less than that of a single disk drive because failure of any one of the drives in the array can result in a loss of data.
RAID level 1, also called mirroring, provides simplicity and a high level of data availability. A mirrored array includes two or more disks wherein each disk contains an identical image of the data. A RAID level 1 array may use parallel access for high data transfer rates when reading. RAID 1 provides good data reliability and improves performance for read-intensive applications, but at a relatively high cost.
RAID level 2 is a parallel mapping and protection technique that employs error correction codes (ECC) as a correction scheme, but is considered unnecessary because off-the-shelf drives come with ECC data protection. For this reason, RAID 2 has no current practical use, and the same performance can be achieved by RAID 3 at a lower cost. As a result, RAID 2 is rarely used.
RAID level 3 adds redundant information in the form of parity data to a parallel accessed striped array, permitting regeneration and rebuilding of lost data in the event of a single-disk failure. One stripe unit of parity protects corresponding stripe units of data on the remaining disks. RAID 3 provides high data transfer rates and high data availability. Moreover, the cost of RAID 3 is lower than the cost of mirroring since there is less redundancy in the stored data.
RAID level 4 uses parity concentrated on a single disk to allow error correction in the event of a single drive failure (as in RAID 3). Unlike RAID 3, however, member disks in a RAID 4 array are independently accessible. Thus RAID 4 is more suited to transaction processing environments involving short file transfers. RAID 4 and RAID 3 both have a write bottleneck associated with the parity disk, because every write operation modifies the parity disk.
In RAID 5, parity data is distributed across some or all of the member disks in the array. Thus, the RAID 5 architecture achieves performance by striping data blocks among N disks, and achieves fault-tolerance by using 1/N of its storage for parity blocks, calculated by taking the exclusive-or (XOR) results of all data blocks in the parity disks row. The write bottleneck is reduced because parity write operations are distributed across multiple disks.
The RAID 6 architecture is similar to RAID 5, but RAID 6 can overcome the failure of any two disks by using an additional parity block for each row (for a storage loss of 2/N). The first parity block (P) is calculated with XOR of the data blocks. The second parity block (Q) employs Reed-Solomon codes.
RAID 6 provides for recovery from a two-drive failure, but at a penalty in cost and complexity of the array controller because the Reed-Solomon codes are complex and may require significant computational resources. The complexity of Reed-Solomon codes may preclude the use of such codes in software and may necessitate the use of expensive special purpose hardware. Thus, implementation of Reed-Solomon codes in a disk array increases the cost and complexity of the array. Unlike the simpler XOR codes, Reed-Solomon codes cannot easily be distributed among dedicated XOR processors.
The present invention solves these and other problems by providing two-drive fault tolerance using simple XOR codes (rather than Reed-Solomon codes). The XOR parity stripe units are distributed across the member disks in the array by separating parity stripe units from data stripe units. In one embodiment, the number of data stripe units is the same as the square of two less than the number of drives (i.e., (Nxe2x88x922 * Nxe2x88x922)). Each data stripe unit is a member of two separate parity sets, with no two data stripe units sharing the same two parity sets. Advantageously, the storage loss to parity stripe units is equal to the sum of the dimensions, so this parity arrangement uses less storage than mirroring when the number of total drives is greater than four.
One embodiment includes a redundant array of independent disk drives that provides one-drive and two-drive fault tolerance. The array includes two or more disk drives and a disk controller. Data recovery from a one or two drive failure is accomplished by using a two-dimensional XOR parity arrangement. The controller is configured to calculate row XOR parity sets and column XOR parity sets, and to distribute the parity sets across the disks drives in the array. The parity sets are arranged in the array such that no data block on any of the disk drives exists in two row parity sets or two column parity sets. In one embodiment, the controller is configured to reduce reconstruction interdependencies between disk blocks.