1. Field of the Invention
The present invention relates to the writing of data to redundant arrays of independent storage devices. Specifically, the present invention addresses the problem of device failure during data writes.
2. Background of the Invention
In David A. Patterson, Garth Gibson, and Randy H. Katz, “A Case for Redundant Arrays of Inexpensive Disks (RAID),” Proc. ACM SIGMOD Conference, June 1988, the concept of RAID (“Redundant Arrays of Inexpensive Disks,” also sometimes referred to in the literature as “Redundant Arrays of Independent Disks”) was introduced. The basic concept of RAID is to replace a “Single Large Expensive Disk” (SLED), such as were commonly used in mainframe computers, with an array of smaller, lower-end “inexpensive disks,” such as are used in personal computers, in order to increase performance while keeping costs down. Whereas a SLED might be capable of accessing one sector at a time, a RAID would utilize multiple disks operating in parallel to increase overall throughput by accessing multiple sectors at one time on different disks.
A RAID system may employ a technique called “striping” to distribute data across multiple disks. In striping, a sequence of portions of data (e.g., bits, bytes, disk sectors, tracks, etc.) is written in such a way that a portion is written to a first disk in the array, then the next portion is written to a second disk, and so on until each disk is written to. Then, the array of disks is cycled through again, so that the data is distributed across the array. Many different striping arrangements are possible in a RAID array using different sizes of data portions and different sequencing of the writes across the array.
Since RAID was intended to be used with “inexpensive” and presumably less reliable disks and because employing an array of disks greatly increases the likelihood of a failure (regardless of the quality of the disks), most RAID systems employ some kind of fault-tolerance or redundancy (the “R” in RAID). The original Patterson paper described several different “levels” of RAID, ranging from RAID Level 1 to Level 5, each with a different arrangement of data disks and “check” disks. The lower RAID Levels, Level 1 and Level 2, employ more expensive fault-tolerance techniques, such as mirroring (Level 1) and error correction codes (Level 2). The higher level RAID systems (Level 3 and above) store parity information.
The parity of a string of bits is the exclusive-or (XOR) over the entire string. The parity of a string of bits is “1” if the number of 1's appearing in the string of bits is an odd number (which is also referred to as having “odd parity”); if an even number of 1's appear in the string, the parity is “0” (even parity). Storing an additional parity bit along with a string of bits (such as a byte or word) allows a single-bit error to be corrected, provided the location of the error within the string is known. Generally, locating an error in an array of storage devices is not a problem, because the electronics in each storage device will generally be capable of detecting when the device has failed. If a storage device has failed, the missing data bit from that device can be reconstructed by XOR'ing bits from the other devices and comparing the result with the stored parity bit. If the two bits match, then the missing bit is a zero. If they do not, the missing bit is a one.
The most straightforward approach to calculate parity information in an array of storage devices is to execute the following process. For each address on the devices, XOR the data in each of the storage devices at that address (e.g., XOR the data at address 1 on disk 1 with the data at address 1 on disk 2, etc.). Such an arrangement is limited to correcting errors due to a single device failure, provided the identity of the failed device is known. This is referred to as a “single-dimension” parity calculation. Multiple-dimension parity calculations are also possible by calculating parity bits for various groupings of bits across the storage devices. Multiple-dimension parity information can be used to correct errors due to multiple device failures.
When one of the disks in a RAID fails, it can be replaced and the lost data recreated using parity information or other fault-tolerance techniques, such as error correcting codes or mirroring. Thus, in a sense, a RAID array acts as its own backup.
The basic RAID concept can be applied to other media besides disks. Clearly, any direct-access storage device (DASD) type, such as a CD-RW or memory, could be used to create a RAID-like array of storage devices. It is also possible to achieve fault-tolerance and performance benefits in “limited-performance” media such as tapes, by using a RAID-like array, called a RAIT (Redundant Array of Independent Tapes).
The term “limited-performance” media is used herein to denote storage media that exhibit performance limitations when operated in a random-access fashion. Examples of such performance limitations include, but are not limited to, slow seek or access time and inability to selectively overwrite portions of the storage media. Tape drives, for example, have a slow seek or access time, due to the fact that they operate on sequential access storage media (i.e., storage media that are accessed in a sequential fashion). Also, some tape drives are limited in their ability to selectively overwrite portions of a tape.
Reconstruction of an array of limited-performance devices after a device failure is more difficult than with a RAID. Because a RAID is comprised of direct-access storage devices (DASDs), it is possible to reconstruct a lost volume in the array while still writing new data to the array. The writes made to reconstruct the lost data are simply interspersed within the new data writes and the replacement storage device simply seeks back and forth between the portion of the storage space being reconstructed and the portion being written to with new data.
With an array of limited performance devices, such as a RAIT, however, random-access of the storage space is not possible, as tapes read and write data sequentially. Therefore, a more advanced form of reconstruction is needed in order to allow for continuous writing of new data to an array of limited performance devices even in the presence of a device or media failure.