1. Review of RAID
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is an information handling system. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Redundant Array of Independent Disks (RAID) is a taxonomy of redundant disk storage schemes which defines a number of ways for configuring and using multiple computer disk drives to achieve varying levels of availability, performance, capacity, and cost while appearing to the software application as a single large capacity drive. Various RAID levels have been defined from RAID 0 to RAID 6, each offering tradeoffs between these attributes.
RAID 0 is nothing more than traditional striping in that user data is broken into chunks which are stored onto the stripe set by being spread across multiple disks with no data redundancy. RAID 1 is equivalent to conventional “shadowing” or “mirroring” techniques and is the simplest method of achieving data redundancy by having, for each disk, another containing the same data and writing to both disks simultaneously. Combinations of RAID 0 and RAID 1 are referred to as “shadowing striped sets” (RAID 0+1) or “striping shadow sets (RAID 1+0 or RAID 10)—either one resulting in the relative performance advantages of both RAID levels.
RAID 2, which utilizes a Hamming Code written across the members of the RAID set is not now considered to be of significant importance.
In RAID 3, data is striped across a set of disks with the addition of a separate dedicated drive to hold parity data. The parity data is calculated dynamically as user data is written to the other disks to allow reconstruction of the original user data if a drive fails without requiring replication of the data bit-for-bit. Error detection and correction codes (“ECC”) such as exclusive OR (“XOR”) or more sophisticated Reed-Solomon techniques may be used to perform the necessary mathematical calculations on the binary data to produce the parity information in RAID 3 and higher level implementations. While parity allows the reconstruction of the user data in the event of a drive failure, the speed of such reconstruction is a function of system workload and the particular algorithm used.
As with RAID 3, the scheme known as RAID 4 consists of N data disks and one parity disk wherein the parity disk sectors contain the bitwise XOR of the corresponding sectors on each data disk. This allows the contents of the data in the RAID set to survive the failure of any one disk. RAID 5 is a modification of RAID 4 which stripes or “rotates” the parity across all of the disks in the array in order to statistically equalize the load on the disks. In certain RAID 4 and RAID 5 implementations, insufficient data is received to enable parity to be calculated solely from the incoming data, as is the case with RAID 3. As one example, in the case of a RAID 5 “partial stripe” write, the array controller or RAID software must combine new data with old data and existing parity data to produce the new parity data, requiring each RAID write to include a read from two drives (old data, old parity), the calculation of the difference between the new and old data, the application of that difference to the old parity to obtain the new parity, and the writing of the new data and parity back onto the same two drives. The situation is somewhat simplified in the case of a so-called “full-stripe” write, in that only the new data is needed, but the calculation of new parity must still be performed.
The designation of RAID 6 has been used colloquially to describe RAID schemes that can withstand the failure of two disks without losing data through the use of two parity drives (commonly referred to as the “p” and “q” drives) for redundancy and sophisticated Error Correction Coding (ECC) techniques. Data and ECC information are striped across all members of the RAID set. Write performance is generally worse than with RAID 5 because three separate drives must each be accessed twice during writes.
RAID-6 has become used a generic term for any of a number of schemes that provide double fault tolerance the failure of up to two data volumes. Unlike other RAID types (e.g., RAID-5), RAID-6 therefore does not necessarily refer to a single, standardized algorithm. RAID-6 is also referred to as “dual parity” or “double parity” data protection. Traditionally, RAID-6 p parity blocks are identical to the parity blocks used in RAID-3 or RAID-5. The other (q) parity blocks are generally calculated using either Galois Field arithmetic or by performing an XOR operation across a different set of data blocks from those used to compute the p parity block.
Double parity schemes used with RAID-6 that do not involve Galois Field arithmetic (henceforth called “dual parity” schemes) all involve a matrix of n×n data blocks plus a vector of n parity blocks, “p”, and another vector of n parity blocks, “q”. “p” is the conventional RAID-5 parity (horizontal across n data blocks). Ideally, each data block contributes to a single p and a single or two parity blocks in total. This implies a measure of efficiency that can be called the “contribution ratio”—this ratio being 2 for the ideal case. But the ideal case requires a very specific layout of parity and data blocks across the disks, which is unlikely to be appropriate given other goals such as performance. More general layouts are possible but only by increasing the contribution ratio—the q expressions end up having more than n terms in them, or they incorporate p parity blocks as well as data blocks.
An example of a RAID-6 scheme that does not use Galois Field arithmetic is “RAID-DP”. In this “diagonal parity” method, data is organized as an n×n matrix of data chunks. The columns of the matrix correspond to data volumes, but the rows are arbitrary—they correspond to “chunks” of data, which could be a single bit or an entire stripe, as long as it is uniform.
There are two sets of parity blocks for the matrix, p and q parity, on two parity drives. (Of course in the real world the parity is “rotated” to spread the load, but that complication is ignored here.) Both the p and q parity are computed with simple XOR operations over sets of blocks. The equations for p parity are simple:
            ∑              j        =        0                    n        -        1              ⁢                  ⁢          d              i        ,        j              =                    p        i            ⁢                          ⁢      for      ⁢                          ⁢      0        ≤    i    <    n  Standard RAID-5 parity equations
(A note on notation. di,j is the ith data chunk on data disk j of the RAID set, pi and qi are the ith chunks of P and Q parity. The “+” symbol is used to represent chunk XOR operations in the context of data, and ordinary addition in the context of indices and other bookkeeping variables. This should not cause confusion since the context is clear. Also, the “*” symbol will be used to represent the “multiplication” of a chunk of data by a binary “selector”—i.e. B*p1 is equal to “if B then p1 else 0”—in addition to the ordinary multiplication operation on indices, etc.)
The equations for q parity are a little more complex:
            (                        ∑                      j            =            0                                n            -            i            -            2                          ⁢                                  ⁢                  d                      j            ,                          i              +              1              +              j                                          )        +          p              n        -        1        -        i              +                  ∑                  j          =                      n            -            i                                    n          -          1                    ⁢                          ⁢              d                  j          ,                      i            +            j            -            n                                =                    q        i            ⁢                          ⁢      for      ⁢                          ⁢      0        ≤    i    ≤    n  Diagonal parity equations
For example, if n=5, the qi are calculated thusly:d0,1+d1,2+d2,3+d3,4+p4=q0 d0,2+d1,3+d2,4+p3+d4,0=q1 d0,3+d1,4+p2+d3,0+d4,1=q2 d0,4+p1+d2,0+d3,1+d4,2=q3 p0+d1,0+d2,1+d3,2+d4,3=q4 
This is graphically depicted in FIG. 2. It is easy to see from that diagram that p parity is “row” parity, and q parity is “diagonal” parity for those diagonals that include the p parity column (i.e. all diagonals but the “data only diagonal”). The q parity diagonals thus “wrap around” from the last to the first column as necessary.
2. Limitations of Certain Approaches
When trying to recover from, say the failure of two data volumes, one must solve a set of 2n simultaneous equations (the p and q parity equations) in 2n unknowns (n missing data chunks from each of the two failed volumes). For instance, if the drives representing the leftmost two columns of these equations failed, the ten equations that would have to be solved for the ten variables {di,0, di,1; 0≦i≦4} would include:
d1,0+d1,1=(p1+d1,2+d1,3+d1,4) Row parity equation for p1 with known terms removed d1,0+d2,1(q4+p0+d3,2+d4,3) Diag parity equation for q4 with known terms removed
The right-side of these equations are sometimes referred to as the “syndromes” of the particular error encountered, and of course are dependent on the actual data on the disks as well as the selection of which drives have failed. The left-side of these equations is dependent only on the total number of data drives and the selection of which drives have failed. As with ordinary simultaneous equations, the values of the syndromes do not affect the solvability of the system of equations; only the structure of the left-side determines this.
In the case of RAID-DP, it can be shown that each of the n (n−1)/2 systems of 2n simultaneous equations, resulting from the n (n−1)/2 combinations of two of the n data drives failing, is always solvable if n+1 is a prime number. When one or both of the failed drives is a parity drive, the systems of equations are much more straightforward and can also all be solved.
Every chunk of data in these methods contributes to one p parity chunk and at least one q parity chunk. Some contributions to q parity chunks are direct, and some are indirect through the p parity chunk that covers the data chunk and itself contributes to a q parity chunk. For instance, d1,2 contributes to p1 and to q0 and (via p1) to q3. In fact, in the above example all data chunks but the “data-only diagonal” chunks contribute to two q parity chunks, and all data chunks contribute to a p parity chunk, so the “average” data chunk contributes to 2.8 parity chunks—that number, which we will call the Contribution Ratio, asymptotically approaches 3 for large RAID-DP arrays. This means that a change to a single data chunk will usually result in the re-computation of 3 parity chunks. Since writing (or reading-modifying-writing) three parity blocks for every single-block data write (using the standard RAID “sub-stripe write” algorithm) is unacceptably high overhead, these RAID 6 implementations use a chunk size that is no larger than ((sector size)/n), so that an entire column of data chunks fits into a single disk sector and contributes to corresponding single p and q disk sectors.
In another RAID 6 method from Adaptec, each data chunk contributes to exactly two parity chunks—this is the theoretical minimum Contribution Ratio, because if any data chunk contributed to only a single parity chunk then the failure of that data volume and parity volume would leave the data unrecoverable. However, Adaptec's method requires the intermixing of parity and data on every data volume in a very specific way, as it depends on the fact that the loss of any volume will result in the loss of some data chunks and some parity chunks. As a result of this intermixing, Adaptec's RAID 6 encourages the use of a large chunk, on the order of 64-1024 disk sectors, to avoid having read requests span multiple data chunks and incurring extra disk head motion; however, the entire data+parity matrix must be present for dual-failure correction, which requires the use of n*(n+2)*(chunk size) bytes of controller memory; for instance, if n=8 and the chunk size is 512 sectors (256 KB), then 20 MB of controller memory is required for a dual-failure recovery. The Adaptec RAID 6 also makes it harder to optimize sequential small writes into an efficient “RAID 3 style” write because of the large chunk size and the large number of chunks in a complete RAID 6 redundancy group.
One might think there are other XOR-only RAID-6 algorithms, as the number of sets of parity equations for n data disks in a RAID 6 configuration as described above, even assuming that p parity is identical to RAID 5 parity, is huge—on the order of 2n3. Applying some simplifying assumptions about the number of data chunks contributing to each q parity chunk reduces the number of sets of equations to n2n+3, still huge. So the problem is finding one. The simplest diagonal parity scheme, with the smallest Contribution Ratio (2.0), is:
                    ∑                  j          =          0                          n          -          1                    ⁢                          ⁢              d                  i          ,          j                      =                            p          i                ⁢                                  ⁢        for        ⁢                                  ⁢        0            ≤      i      <      n                          ∑                  j          =          0                          n          -          1                    ⁢                          ⁢              d                  j          ,                                    (                              i                +                j                            )                        ⁢            modn                                =                            q          i                ⁢                                  ⁢        for        ⁢                                  ⁢        0            ≤      i      <      n      
This is graphically depicted in FIG. 3. It is a very straightforward set of equations, but unfortunately it fails as a data protection technique. As stated before, when trying to recover from the failure of two data volumes we must solve a set of 2n simultaneous equations (the p and q parity equations) in 2n unknowns (the n missing data chunks from each of the two failed volumes). For the simple diagonal parity above, none of the n(n−1)/2 combinations of two failed data volumes results in a solvable set of simultaneous equations.