With the increase in use of large-scale storage systems, such as with Fiber Channel and Gigabit Ethernet systems, there is an increase in the susceptibility of these systems to multiple disk failures. The rapid growth of disk capacity also prolongs the disk recovery time in the event of disk failures. This prolonged recovery time increases the probability of subsequent disk failures during the reconstruction of user data and parity information stored in a faulty disk. In addition, latent sector failures caused by data that was left unread for a long period of time may prevent data recovery after a disk failure that results in loss of data. The use of less expensive disks, such as ATA (Advanced Technology Attachment) disks, in arrays where high data integrity is required also increases the probability of such disk failures.
RAID (Redundant Array of Independent Disks) architectures have been developed to allow recovery from disk failures. Typically, the XOR (Exclusive-OR) of data from a number of disks is maintained on a redundant disk. In the event of a disk failure, the data on the failed disk is reconstructed by XORing the data on the surviving disks. The reconstructed data is written to a spare disk. However, data will be lost if the second disk fails before the reconstruction is complete. Traditional disk arrays that protect the loss of no more than one disk are inadequate for data recovery, especially for large-scale storage systems.