RAID data storage systems are multi-disk storage systems that maintain data integrity even when a disk in the system fails. RAID storage systems maintain data integrity by storing parity information for all of the data. Parity information allows the system to rebuild all of the data from a failed disk based on the data stored in all of the other disks. Conventional methods for generating parity information involve dividing each disk into logical segments of equal size, associating one logical segment from each drive into a logical stripe, then performing an exclusive disjunction (commonly known as an XOR operation) on all of the logical segments in the logical stripe to produce one logical segment of parity information. The logical segment of parity information is stored in the same logical stripe on an unused disk. If any one disk fails, the logical segment of the logical stripe stored on that disk can be rebuilt by performing an exclusive disjunction on all of the remaining logical segments in that logical stripe.
This conventional method for generating parity information cannot survive more than one disk failure and may require significant processing time to rebuild a failed disk depending on the size of the system. Alternative methods for producing parity information can tolerate more than one disk failure, but each method sacrifices some amount of speed or efficiency in favor of fault tolerance. For example, RAID 6 storage systems maintain two independent segments of parity information for each logical stripe; one segment is produced using a complex mathematical algorithm. This method is tolerant of two disk failures but adds significant processing time to produce and update the second parity segment.
What is needed is a method for producing parity information that is tolerant of more than one disk failure, but that does not significantly degrade the efficiency of the data storage system during updates or rebuilds.
RAID systems usually include at least one powered and spinning but unused disk called a “hot spare” where the system can immediately begin rebuilding a failed disk. Rebuilding the data from a failed disk is a laborious, time consuming, energy intensive process. Because existing systems rebuild the failed disk on the hot spare, the speed of the rebuilding process is limited by the bandwidth of the hot spare. Furthermore, users continue to access the RAID system during the rebuilding process, further consuming disk bandwidth and increasing rebuild time.
Consequently, it would be advantageous if a method existed that was suitable for rebuilding multiple segments of a failed disk in parallel.