Computer systems utilize data redundancy schemes such as parity computation to protect against loss of data on a storage device. A redundancy value is computed by calculating a function of the data of a specific word size across a quantity of similar storage devices, also referenced as data drives. One example of such redundancy is exclusive OR (XOR) parity that is computed as the binary sum of the data; another common redundancy uses Reed-Solomon codes based on finite field arithmetic.
The redundancy values, hereinafter referenced as parity values, are stored on a plurality of storage devices, also referenced as parity drives. In the case of a parity drive failure, or loss of data on the parity drive, the data on the parity drive can be regenerated from data stored on the data drives. Similarly, in the case of data drive failure, or loss of data on the data drive, the data on the data drive can be regenerated from the data stored on the parity drives and other non-failing data drives. Data is regenerated from the parity drives by adding the data on the remaining data drives and subtracting the result from data stored on the parity drives.
In Redundant Arrays of Independent Disk (RAID) systems, data files and related parity are striped across disk drives. In storage subsystems that manage hard disk drives as a single logical direct or network access storage device (DASD/NASD), the RAID logic is implemented in an array controller of the subsystem. Such RAID logic may also be implemented in a host system in software.
Disk arrays, in particular RAID-3 and RAID-5 disk arrays, have become accepted designs for highly available and reliable disk subsystems. In such arrays, the XOR of data from some number of disks is maintained on a redundant disk (the parity drive). When a disk fails, the data on it can be reconstructed by exclusive-ORing the data on the surviving disks and writing this data into a spare disk. Data is lost if a second disk fails before the reconstruction is complete.
Typical storage system models emphasize three principle metrics: reliability, storage efficiency, and performance. The reliability of an array code is a function of its column distance. A code of column distance d can recover from the erasure of d−1 entire columns without data loss. The storage efficiency of a code is the number of independent data symbols divided by the total number of symbols used by the code. The performance of an array code is measured with respect to the update complexity (UC) of the array code; i.e., the number of parity symbols affected by a change in a data symbol. Update complexity affects the number of IOs required to modify a data symbol, which in turn affects the average throughput of the storage system. Both the average and maximum update complexity over all the data symbols are used as measures of a code's performance.
A variety of techniques have been implemented to reliably and efficiently recover from a failure in a disk array system. Although these techniques have proven to be useful, it would be desirable to present additional improvements. Reed-Solomon codes [reference is made to I. S. Reed, et. al., “Polynomial codes over certain finite fields,” Journal of the Society for Industrial and Applied Mathematics, vol. 8, pp. 300-304, 1960] have been proposed for the storage model [reference is made J. Plank, “A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems,” Software: Practice and Experience, vol. 27, pp. 995-1012, 1997]. However, Reed-Solomon codes require finite field arithmetic and are therefore impractical without special purpose hardware.
Various other codes have been proposed for recovering from failures in storage systems such as, for example, Turbo codes [reference is made to D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms, http://www.inference.phy.cam.ac.uk/mackay/itprnn/], Tornado codes [reference is made to M. G. Luby, et. al., “Efficient erasure correcting codes,” IEEE Transactions on Information Theory, vol. 47, pp. 569-584, 2001], LT codes [reference is made to M. Luby, “LT codes,” in Proceedings of the 43rd Annual IEEE Symposium on the Foundations of Computer Science, 2002, pp. 271-280], and Raptor codes [reference is made to A. Shokrollahi, “Raptor codes,” 2003]. However, the probabilistic nature of these codes does not lend itself well to the storage model. Furthermore, the communication model of these codes puts stress on the computational cost of encoding and decoding as opposed to the cost of IO seeks, which dominate in storage systems.
Conventional RAID algorithms generally tend to be inefficient for all but the distance two case as used by, for example, RAID-5 [reference is made to J. H. Hennessy, et. al., Computer Architecture: A Quantitative Approach. San Francisco, Calif.: Morgan Kaufmann, 2003 and p. Massiglia, The RAID Book. St. Peter, Minn.: The RAID Advisory Board, Inc., 1997]. Array codes are perhaps the most applicable codes for the storage model where large amounts of data are stored across many disks and the loss of a data disk corresponds to the loss of an entire column of symbols [reference is made to M. Blaum, et. al., “Array codes,” in Handbook of Coding Theory (Vol. 2), V. S. Pless and W. C. Huffman, Eds. North Holland, 1998, pp. 1855-1909]. Array codes are two-dimensional burst error-correcting codes that use XOR parity along lines at various angles.
While Low Density Parity Check (LPDC) codes [reference is made to R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, Mass.: MIT Press, 1962 and M. G. Luby, et. al., “Efficient erasure correcting codes,” IEEE Transactions on Information Theory, vol. 47, pp. 569-584, 2001] were originally invented for communication purposes, the concepts have been applied in the storage system framework. Convolution array codes [reference is made to M. Blaum, et al., “Array codes,” in Handbook of Coding Theory (Vol. 2), V. S. Pless and W. C. Huffman, Eds. North Holland, 1998, pp. 1855-1909; and T. Fuja, et al., “Cross parity check convolution codes”, IEEE Transactions on Information Theory, vol. 35, pp. 1264-1276, 1989] are a type of array code, but these codes assume semi-infinite length tapes of data and reconstruction progresses sequentially over these tapes, and in addition their parity elements are not independent. These codes are not directly applicable to the storage model where the efficient reconstruction of randomly located data is required. The present invention has some similarities to convolution array codes, but differ in two respects. The present invention converts the semi-infinite tape into logical short finite loops enabling efficient reconstruction of randomly located data. Furthermore, the present invention has independent parity, allowing for parity computations in parallel.
Maximum Distance Separable (MDS) codes, or codes with optimal storage efficiency, have been proposed. The Blaum-Roth (BR) code [reference is made to M. Blaum, et. al., “On lowest density MDS codes,” IEEE Transactions on Information Theory, vol. 45, pp. 46-59, 1999], the EvenOdd (EO) code [reference is made to M. Blaum, et. al., “EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures,” IEEE Transactions on Computers, vol. 44, pp. 192-202, 1995] and the Row-diagonal Parity (RDP) code [reference is made to P. Corbett, et al., “Row-diagonal parity technique for enabling recovery from double failures in a storage array,” (U.S. patent application US 20030126523 issued as U.S. Pat. No.: 6,993,701)], are distance three codes and achieve optimal storage efficiency but have non-optimal update complexity. The XCode (XC) [reference is made to L. Xu, et. al., “X-code: MDS array codes with optimal encoding,” IEEE Transactions on Information Theory, pp. 272-276, 1999] and ZZS code [reference is made to G. V. Zaitsev, et. al., “Minimum-check-density codes for correcting bytes of errors,” Problems in Information Transmission, vol. 19, pp. 29-37, 1983] achieve both optimal storage efficiency and optimal update complexity but do not generalize to distances greater than three.
A variant of the EvenOdd (EO+(p, d−1)) code achieves column distances greater than three for certain array dimensions, but still has non-optimal update complexity [reference is made to M. Blaum, et. al., “MDS array codes with independent parity symbols,” IEEE Transactions on Information Theory, vol. 42, pp. 529-542, 1996]. The present invention is similar to the EO+(p, d−1) code in that parity is computed along slopes of various values through the two-dimensional array of data and has the notion of logical data elements preset to zero (or some other fixed value). However, the present invention has a different set of preset data elements and so can remove dimension restrictions such as primality of the parameter p and the relationship of the number of columns and the number symbols per column to p.
Conventional high-distance RAID codes such as, for example, R51 and R6 are simple and have very good IO, but are impractical when storage efficiency is important.
Although conventional storage system parity techniques have proven to be useful, it would be desirable to present additional improvements. Conventional storage systems require excessive parity computation or complexity. Conventional storage systems further exhibit restrictive dimensionality constraints.
More recently, storage systems have been designed wherein the storage devices are nodes in a network (not just disk drives). Such systems may also use RAID type algorithms for data redundancy and reliability. The present invention is applicable to these systems as well. Though the description herein is exemplified using the disk array, it should be clear to someone skilled in the art how to extend the invention to the network node application or other systems built from storage devices other than disks.
What is therefore needed is a system, a computer program product, and an associated method for enabling efficient recovery from failures in a storage array without dimensionality constraints. Further, a storage system is desired that achieves greater redundancy with greater flexibility without a loss of performance experienced by conventional storage systems. The need for such system and method has heretofore remained unsatisfied.