1. Field of the Invention
The present invention generally relates to storage systems. More specifically, the present invention pertains to systems and methods for implementing advanced RAID using a set of unique matrices as coefficients.
2. Description of the Related Art
The speed of most processors is increasing faster than the speed of virtually all input/output (I/O) devices. As a response to this widening gap, the invention of the so called RAID (Redundant Array of Independent Disks) system was invented, which is aimed to increase the speed of disk drive reads and writes by replacing single disk drive unit with an array of a plurality of disks with smaller storage capacity. The data contained therein is accessed in parallel.
However, a drawback which is inherent with this solution is while the amount of parallelism (hence efficiency) increases with the size of the array (i.e., the number of disks it contains), the probability of disk failure increases as well. Disk failure may result in the loss of data. Therefore, a method that is able to recover the information contained in any one of the disks needs to be provided.
One known method of insuring the recovery of the data loss in anyone of the disks in an array is to always have two disks containing exactly the same data. This technique (i.e. the so-called mirroring) is used in RAID level 1 architectures. The drawback of this solution is that half of the space needs to be allocated for redundancy.
Another known method is to have a single disk reserved for parity. Here the RAID system will maintain the parity disk to contain the bitwise XOR of all the other disks at all times. If any disk of the array fails, the information contained therein can be recovered along with the information of all the other disks (including the parity disk). The parity information is usually xe2x80x9cstripedxe2x80x9d over all the disks of the array to avoid a bottleneck on one singled out parity disk (RAID level 5). It should be noted that maintaining the system in its xe2x80x9csafexe2x80x9d state requires the parity disk to be updated as soon as any data is updated in other disks.
The Advanced RAID systems deal with the capability to recover data when two or more disks fail in a predetermined period before recovering occurs. One known approach to this problem is to group data disks into different xe2x80x9cparity groupsxe2x80x9d each of which has it""s corresponding parity disk. In this approach, the system maintains the bitwise XOR of the data of every disk of the same parity group on its corresponding parity disk. Much research has been done in this area. For example, a good tutorial on Reed-Solomon methods in RAID-like systems can be seen in xe2x80x9cA tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems.xe2x80x9d By James S. Plank (Technical Report UT-CS-96-332, Jul. 19, 1996). It can be proven that this method requires log2(N) parity disks to take care of N data disks.
It is desirous to have a system that is able to recover from disk failures originating from any two disks and maintaining a minimum number of possible redundancy disks; for example with only two redundancy disks. One standard Advanced RAID scheme using the bare minimum amount of redundancy information uses Reed-Solomon codes. The following is a brief description of the same.
First, a word size is fixed. The disks may be conceptualized as a sequence of chunks of memory, and each chunk (or word) having a fixed size. The redundancy words will then by computed over the data words of a same line. A line is defined as a sequence of words; and each word of the sequence comes from a distinct disk. Basically, as far as the coding scheme is concerned, one can think of our disks as containing only one word of fixed size. The details of implementationxe2x80x94such as the actual size of read and writes, stripping method, etc.xe2x80x94are irrelevant as far as the coding scheme is concerned.
Let D1, D2, . . . , DN be the n data words of the N respective disks of the array. Further let P and Q be the two corresponding redundancy words. In a Reed-Solomon based scheme, P and Q are maintained to be
P=D1+D2+ . . . +DN and
Q=c1*D1+c2*D2+ . . . +cN*DN,
where all operations take place in a carefully chosen Galois field, and c1, c2, . . . , cN are carefully chosen distinct words of this Field.
In the context of Galois Fields, it helps to think of the following three elements (i.e., the bits of our data, redundant backup, and coefficient words) as being coefficients of a polynomial over GF(2). For example, if our words are of size 8, the words 11001101, 01101001 and 00111100 correspond respectively to polynomials:
X7+X6+X3+X2+1,
X6+X5+X3+1, and
X5+X4+X3+X2.
The degree of a polynomial is the largest exponent appearing as a superscript of the X""s. The degrees of the three above polynomials are respectfully 7, 6, and 5.
The sum of two words then corresponds to summing the two corresponding polynomials (in GF(2), where, since 2=0, we have Xi+Xi=0); this corresponds to the bitwise XORing of both words. For example, the sum of the words 11001101 and 01101001 is 10100100. From now on the reader must understand the sum of two words as being what has just been described.
The * operand corresponds to polynomial multiplication, and a final reduction modulo by the fixed irreducible polynomial which generates the Field. An irreducible polynomial is a polynomial which can not be obtained by multiplying two other polynomials together (just as a prime number is one which is not the product of two other numbers). For example, let
F=X8+X5+X4+X2+X1
be our chosen irreducible polynomial (field generator). To obtain the word 11001101*01101001 we first find the product M of the corresponding polynomials; and M is as shown in FIG. 0A.
One then computes the remainder of M through the Euclidian division by F. This corresponds to adding shifts of F (polynomials of the form Xa F for some natural number a) to M until we get rid of all the terms of degree higher or equal to the degree of F. The computation is shown by FIG. 0B. Thus 11001101*01101001=01101111 in the field generated by F.
The point of working in a field is that every element of a field is invertible. That is to say, for any word there is a corresponding xe2x80x9cinverse wordxe2x80x9d wxe2x88x921 such that w*wxe2x88x921=00 . . . 01. This relationship may be used to precisely recover lost information resulting from such events as failed disks.
A further example is provided to depict how one can recover data when the relations
P=D1+D2+ . . . +DN and
Q=c1*D1+c2*D2+ . . . +cN*DN,
are maintained. Assuming disk 1 and disk 2 fail. This means that the information contained in D1 and D2 are lost. By solving the above system of equations for D1 and D2, we get
D1=(c1+c2)xe2x88x921*(c2*P+Q+c3*D3+ . . . +cN*DN)
and
D2=P+D1+D3+ . . . +DN.
As can be seen, D1 and D2 can be computed if P, Q, D3, D4, . . . , DN are not lost (or are known).
It can further be seen that in order to maintain the system in a safe state the * operation must be done not only at every update of a data word, but also needs to be done many times during recovery. Therefore, the field generating polynomial and the N coefficients must be chosen carefully so as to minimize the time needed to compute the * operation between two words.
Lookup table methods have been used to compute the * operation, but this method becomes unreasonable as the word size increases. For example, too many calculations are needed in a too limited period of time. Yet a smaller word size induces a larger number of operations per data block, hence the need to increase the word size or pipe line the operations is required
U.S. Pat. No. 5,499,253, entitled xe2x80x9cSystem and Method for Calculating RAID 6 Check Codesxe2x80x9d teaches a careful matching of the characteristics of the Commutative Ring in which calculations take place to the capabilities of modern high-speed computers.
U.S. Pat. No. 6,148,430, entitled xe2x80x9cEncoding Apparatus for RAID-6 System and Tape Drivesxe2x80x9d teaches a system including an irreducible polynomial. The choice of the polynomial or a xe2x80x9cgood generating polynomialxe2x80x9d show that one can find such good polynomials for arbitrary large word sizes.
As pointed out in U.S. Pat. No. 5,499,253, one does not need all the properties of Galois Fields for the specific purpose of maintaining the system in a recoverable state. A less restrictive * operation (placing it in what is called a Commutative Ring) is desirable to increase the space of possible generating polynomials, thereby finding one which reduces further the complexity of the * operation. Therefore, for two large words, one can also compute the * operation with controlled complexity by carefully choosing the polynomial generating the Galois Field. Large word means any suitable sized word in use without reducing the size of the word for fitting into a specific system. For example, if it is desirous to use 32 bit words in a system, and it is required to done size the 32 bit words to 8 bit for processing, then 32 bit word is the big word and 8 bit word is not.
As can be appreciated, it is desirous to generalize the known field and ring methods to any suitable large word sizes. One resultant benefit is the increased range of computational techniques for parity calculations, thereby allowing more flexibility and scope in finding an adaptable and efficient computational scheme. For example, word size can be any size; it may be as small as 4 bits and as large as any suitable size. With the increase in the word size, the scope (i.e., number) of co-efficients can be correspondingly increased as well. Therefore, it is desirable to a system and method for permitting more flexibility and scope in finding the adaptable and efficient computational scheme such that a generalized approaching using field and ring can be applied.
The present invention provides a generalized method for standard Galois Field operational schemes used in Advanced RAID parity calculations. This method places the standard field and ring operations in their generalized context of linear operations, which can be described by matrices over the Galois Field with two elements (GF(2)). To ensure recovery of information, one must impose certain conditions on these matrices. A plurality of methods for generating such matrices are provided, thereby increasing the range of computational techniques for parity calculation. Further, the method provides increased flexibility and scope in finding a more efficient computational scheme that is adapted to a particular hardware or software implementation.
Accordingly, in an advanced RAID system, having at least one array of n disks wherein n is a positive integer greater than or equal to 2 is provided. The RAID system further has a set of m redundancy disks associated with said at least one array wherein m is a positive integer greater than or equal to 3. The system further includes data and redundancy data in the form of a set of word segments residing in the above disks. The word segments have equal length and predetermined size. A method for updating redundancy information and associated with the advanced RAID system, includes: providing a first expression of redundancy data in terms of a first summation of a sequence of data, said sequence including a summation of data segments; providing a second expression of redundancy data in terms of a second summation, wherein the elements of said second summation includes elements of the first summation having each of the elements contained therein multiplied by a first sequence of co-effiecients; providing a mth expression of redundancy data in terms of a (mxe2x88x921)th summation of a sequence of data, wherein the elements of said (mxe2x88x921)th summation includes elements of the mth summation having each of the elements multiplied by a (mxe2x88x921)th sequence of co-effiecients; computing said first expression having at most m unknown values; computing said second expression having at most m unknown values, and second expression includes a set of translation of said sequence of data, said translation of data being subjected to conditions including for any 2 given words w and wxe2x80x2 Tk(w+wxe2x80x2)=Tk(w)+Tk(wxe2x80x2); and computing a (mxe2x88x921)th set of values representing said mth sequence of co-efficients. Thereby a large set of co-efficients for first to mxe2x88x921 expressions can be carefully selected for use in redundancy calculations for at most m disk failures.
In addition, an advanced RAID system, having at least one array of n disks wherein n is a positive integer greater than or equal to 2, and a set of 2 redundancy disks associated with said at least one array is accordingly provided. The system further includes data and redundancy data in the form of a set of word segments residing in the above disks. The word segments have equal length and predetermined size. A method for updating redundancy information associated with the advanced RAID system, including: providing a first expression of redundancy data in terms of a first summation of a sequence of data, said sequence including a summation of data segments; providing a second expression of redundancy data in terms of a second summation, wherein the elements of said second summation includes elements of the first summation having each of the elements contained therein multiplied by a first sequence of co-effiecients; computing said first expression having at most 2 unknown values; computing said second expression having at most 2 unknown values, and second expression includes a set of translation of said sequence of data, said translation of data being subjected to conditions including for any 2 given words w and wxe2x80x2 Tk(w+wxe2x80x2)=Tk(w)+Tk(wxe2x80x2); and computing a (mxe2x88x921)th set of values representing said mth sequence of co-efficients. Thereby a large set of co-efficients for 2 expressions can be carefully selected for use in redundancy calculations for at most 2 disk failures.