This invention relates to methods of storing data in a redundant array of disks, more particularly to methods that speed up the storage of data, and the recovery of data from a failed disk.
Many computer systems use arrays of rotating magnetic disks for secondary storage of data. In particular, a redundant array of inexpensive disks (referred to as a RAID) has been shown to be an effective means of secondary storage. RAID schemes have been classified into five levels: a first level in which the same data are stored on two disks (referred to as mirrored disks); a second level in which data are bit-interleaved across a group of disks, including check disks on which redundant bits are stored using a Hamming code; a third level in which each group has only a single check disk, on which parity bits are stored; a fourth level that uses block interleaving and a single check disk per group; and a fifth level that uses block interleaving and distributes the parity information evenly over all disks in a group, so that the writing of parity information is not concentrated on a single check disk.
The interleaving schemes of RAID levels two to five conventionally imply that a single collection of data, such as a file or record, is distributed across different disks. For example, when a file with a size equivalent to three blocks is stored in RAID level four or five, the three blocks are conventionally written on three different disks, and parity information is written on a fourth disk. This scheme has the advantage that the four disks can be accessed simultaneously, but the disadvantage that access to each disk involves a rotational delay, and the file access time depends on the maximum of these four rotational delays.
For a large file having many blocks stored on each disk, the advantage of simultaneous access outweighs the disadvantage of increased rotational delay, but for a small file the reverse may be true. For small amounts of data, RAID level one, in which identical data are stored on two mirrored disks, is faster than the other RAID levels, which tend to spread the data and check information over more than two disks. RAID level one, however, is highly inefficient in its use of space, since fully half of the disks are redundant.
Write access at RAID levels two to five is slowed by an additional factor: the need to read old data and old parity information in order to generate new parity information. In a conventional system employing RAID level four, for example, all disks are originally initialized to zeros. When data are written thereafter, the check disk in each group is updated so that it always represents the parity of all data disks in its group. Accordingly, when one block of data is written on a data disk, first the old data are read from that block and the corresponding old parity information is read from the check disk; then new parity is computed by an exclusive logical OR operation performed on the old data, old parity, and new data; and finally, the new data and new parity are written to the data disk and check disk. Write access to a single block therefore entails two read accesses and two write accesses, with one full rotation of the disks occurring between the read and write accesses.
Redundant arrays usually have standby disks for the replacement of disks that fail during operation. The data on a failed disk are conventionally reconstructed by reading the entire contents of all other disks in the same group and performing an operation such as an exclusive logical OR; then the reconstructed data are written onto a standby disk. This method has the advantage of placing the standby disk in exactly the same state as the failed disk, but the disadvantage of taking considerable time, even if the failed disk contained only a small amount of data. The process of replacing the failed disk and reconstructing its data is usually carried out during system operation, so system performance suffers in proportion to the time taken.