1. Field of the Invention
The invention relates generally to reliability in storage systems and more specifically relates to methods and systems for tolerating a two device failure in a Redundant Array of Independent Disks (RAID) level 5 volume.
2. Discussion of Related Art
RAID storage systems enhance both reliability and performance of computer storage systems. In general, RAID storage systems provide for performance enhancements by “striping” blocks of data for a logical volume over multiple storage devices (e.g., multiple disk drives). Striping is often referred to as RAID level 0. By striping data, a read or write request may be processed by operating multiple disk drives in parallel to thereby improve performance as compared to a single disk drive servicing the entire request.
RAID storage systems enhance reliability by providing for various forms of redundancy (redundancy information) such that a single disk failure will not disrupt access to the logical volume or risk loss of data. Rather, the RAID logical volume may continue operating and processing requests though possibly in a degraded performance mode without loss of data. For example, RAID level 1 (RAID1) provides enhanced reliability by mirroring data of a disk drive on a second disk drive. Thus if the first drive fails, the second drive assures there will be no loss of data. RAID level 5 (RAID5) enhances performance and reliability by striping user data over multiple storage devices and adding redundancy information to each such stripe (i.e., an XOR parity block added to the data blocks of each stripe). The XOR sum of the data of the other blocks of a strip provides the desired redundancy in that if any single disk drive storing a block of a stripe fails, the data in the block on the failed drive may be reconstructed from the XOR sum of the remaining drives storing other blocks of the stripe. RAID level 6 (RAID6) builds upon the RAID level 5 structure by adding a second redundancy block to each stripe. This second redundancy block typically comprises a Galois field multiplication (GFM) value computed from the data of the other blocks of the stripe (or other forms of redundancy computations). The XOR parity block and the second redundancy block in a RAID 6 configuration assures that no data is lost on the logical volume even if a second drive should fail.
RAID level 1 increases cost per unit of storage to achieve its redundancy in that every unit of data is duplicated thus the physical capacity requirements of a RAID level 1 volume is twice that of the logical capacity. By contrast, RAID level 5 requires one additional disk drive in addition to the two or more other disk drives on which the data is striped. A minimal RAID level 5 configuration requires three disk drives with the capacity of one of the three or more devices being used for the redundancy information. Any number of additional disk drives may be added to a RAID level 5 volume to increase the storage capacity without increasing the overhead allocation for the parity redundancy information. However, a standard RAID level 5 volume can still tolerate no more than a single drive failure without losing data. RAID level 6 similarly adds cost as compared to RAID level 5 storage management due to the additional drive capacity required for the additional redundancy information. As for the parity information of RAID5, the additional redundancy information is typically spread throughout the various drives of the volume and thus the added drive is a permanent part of the RAID level 6 volume configuration. However, RAID level 6 volumes can tolerate up to two drive failures.
Thus it is an ongoing challenge to provide tolerance for a two drive failure in a RAID level 5 configuration without permanently allocating additional storage capacity as is required for RAID1 and RAID6.