1. Field of the Invention
The present invention relates to computer fault tolerance systems and methods, and particularly to a fault tolerance system and method for one or two failed disks in a disk array.
2. Related Art of the Invention
As users have demanded increased reliability and capacity of computer storage systems, disk array storage systems have evolved as a solution to both needs. Disk array storage systems use multiple disks, and distribute the data over multiple disks. Distributing the data over multiple disks is a process commonly referred to as “striping.” Striping the data over the disks enhances performance as compared to a single disk, because smaller amounts of data are written to or read from multiple disks in parallel. The total time needed to complete a particular read or write operation is therefore reduced because multiple disks perform the operation in parallel.
However, multiplying the number of disks used to store data increases the probability of a disk failure causing loss of data. Therefore storage arrays provide additional storage, which has redundancy information used to recover data lost due to failure of other disks. A Redundant Array of Independent Disks (RAID) is a storage technology commonly used in present-day disk array storage systems. There are several “levels” of RAID technology. RAID level 6 provides for recovery from a two-disk failure using an additional parity block. The first parity block (P) is calculated with XOR (exclusive-or) operations of data blocks. The second parity block (Q) is based on Reed-Solomon codes requiring finite field computations. Such finite field computations are substantially complex and may require significant computational resources. The complexity of Reed-Solomon codes may preclude their use in certain software, or may necessitate the use of expensive special purpose hardware. Thus, implementation of Reed-Solomon codes in a disk array increases the cost and complexity of the array. Unlike the simple XOR codes, Reed-Solomon codes cannot easily be distributed among dedicated XOR processors.
Accordingly, there is a need for an inexpensive and simple system and method for calculating P/Q parities and reconstructing one or two failed disks in a disk array.