Computer systems utilize data redundancy schemes such as parity computation to protect against loss of data on a storage device. A redundancy value is computed by calculating a function of the data of a specific word size, also referenced as a data element, across a quantity of similar storage devices, also referenced as data drives. One example of such redundancy is exclusive OR (XOR) parity that is computed as the binary sum of the data.
The redundancy values, hereinafter referenced as parity values, are stored on a plurality of storage devices in locations referenced as parity elements. In the case of a storage device failure that causes a loss of parity element values, the parity values can be regenerated from data stored on one or more of the data elements. Similarly, in the case of a storage device failure that causes a loss of data element values, the data values can be regenerated from the values stored on one or more of the parity elements and possibly one or more of the other non-failed data elements.
In Redundant Arrays of Independent Disk (RAID) systems, data values and related parity values are striped across disk drives. In storage subsystems that manage hard disk drives as a single logical direct (DASD) or network attached (NASD) storage device, the RAID logic is implemented in an array controller of the subsystem. Such RAID logic may also be implemented in a host system in software or in some other device in a network storage subsystem.
Disk arrays, in particular RAID-3 and RAID-5 disk arrays, have become accepted designs for highly available and reliable disk subsystems. In such arrays, the XOR of data from some number of disks is maintained on a redundant disk (the parity drive). When a disk fails, the data on it can be reconstructed by exclusive-ORing the data and parity on the surviving disks and writing this data into a spare disk. Data is lost if a second disk fails before the reconstruction is complete.
RAID-6 is an extension of RAID-5 that protects against two drive failures. There are many other RAID algorithms that have been proposed to tolerate two drive failures: for example, Reed-Solomon [reference is made to I. S. Reed, et. al., “Polynomial codes over certain finite fields,” Journal of the Society for Industrial and Applied Mathematics, vol. 8, pp. 300-304, 1960], Blaum-Roth [reference is made to M. Blaum, et. al., “On lowest density MDS codes,” IEEE Transactions on Information Theory, vol. 45, pp. 46-59, 1999], EvenOdd [reference is made to M. Blaum, et. al., “EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures,” IEEE Transactions on Computers, vol. 44, pp. 192-202, 1995], Row-Diagonal Parity [reference is made to P. Corbett, et al., “Row-diagonal parity technique for enabling recovery from double failures in a storage array,” (U.S. patent application US 20030126523)], XCode [reference is made to L. Xu, et. al., “X-code: MDS array codes with optimal encoding,” IEEE Transactions on Information Theory, pp. 272-276, 1999], ZZS [reference is made to G. V. Zaitsev, et. al., “Minimum-check-density codes for correcting bytes of errors,” Problems in Information Transmission, vol. 19, pp. 29-37, 1983], BCP [reference is made to S. Baylor, et al., “Efficient method for providing fault tolerance against double device failures in multiple device systems,” (U.S. Pat. No. 5,862,158)], LSI [reference is made to A. Wilner, “Multiple drive failure tolerant raid system,” (U.S. Pat. No. 6,327,672 B1)], Samsung [reference is made to T-D Han, et. al., “Method for storing parity and rebuilding data contents of failed disks in an external storage subsystem and apparatus thereof”, U.S. Pat. No. 6,158,017] and Nanda [reference is made to S. Nanda, “Method and system for disk fault tolerance in a disk array” [reference is made to US patent application US 2004/0078642 A1]. There have been a few additional extensions that protect against multiple drive failures: for example, Reed-Solomon [referenced above], and EO+ [reference is made to M. Blaum, et. al., “MDS array codes with independent parity symbols,” IEEE Transactions on Information Theory, vol. 42, pp. 529-542, 1996].
More recently, storage systems have been designed wherein the storage devices are nodes in a network (not simply disk drives). Such systems may also use RAID techniques for data redundancy and reliability. The present invention is applicable to these systems as well. Though the description herein is exemplified using the disk array, it should be clear to one skilled in the art how to extend the invention to the network node application or other systems built from storage devices other than disks.
Although conventional RAID technology has proven to be useful, it would be desirable to present additional improvements. As can be seen by the various conventional RAID techniques that have been used or proposed, none has been a perfect solution to the variety of requirements that the computer industry places on a storage subsystem. Many conventional systems are complex, requiring extensive computer overhead. Furthermore, many conventional systems have excessive disk IO requirements for certain operations. Others require a large number of drives in the system, and the use of more drives reduces overall system reliability. Many conventional codes tolerate only two failures. Others have constraints on the parameters of the code that are impractical in real systems or impose performance penalties. In addition, many conventional codes that tolerate T failures (that is, all possible combinations of T drives failing), cannot tolerate any combination of more than T drives failing. Conventional RAID techniques that can tolerate additional combinations of failures beyond T have a higher reliability than those that do not.
What is therefore needed is a system, a computer program product and an associated method for enabling recovery from failures in a storage system that is simple, can handle many failure cases, and has reasonable performance and parametric flexibility. The need for such a solution has heretofore remained unsatisfied.