Disk arrays are established as one form of computer system storage (i.e., secondary memory apparatus). By means of RAID, faster data access and higher system reliability is realized in a disk array. RAID has multiple levels, from RAID 0 to RAID 6. Among these RAID levels, RAID 0, RAID 1, RAID 5, and RAID 6 are primarily used in disk arrays.
RAID 0 is also called striping. With RAID 0, fast data I/O is realized with the host by reading and writing data from and to multiple disks in a distributed manner. Such distributed reading and writing of data with respect to multiple disks is called “striping.” The striping data units (i.e., the individual units of data written to each disk) are called blocks or stripes. In addition, the entire set of data written to multiple disks as a result of one striping operation is called a “stripe”. RAID 0 offers no redundancy, with all disks being used for striping. For this reason, with RAID 0, all data is lost when a single disk fails. In other words, RAID 0 has no fault tolerance.
RAID 1 is also called mirroring, and is a technique of simultaneously writing the same data to multiple disks. With RAID 1, data is not lost unless all disks fail. For this reason, the fault tolerance is high.
RAID 5 is also called stripe sets with parity. With RAID 5, data and corresponding parity (i.e., error correcting codes for that data) is written to multiple disks in a distributed manner. With RAID 5, parity is distributed and written to each disks in a predetermined order. Since parity is distributed and recorded on multiple disks in this way, RAID 5 is also called distributed data protection. When striping in RAID 5, not only data but also parity is recorded to the disks, and thus even if a single disk fails, it is still possible to recover data from the failed disk. For this reason, even when a single disk fails, RAID 5 operation can be continued by logically separating the failed disk from the system. The state of continued RAID 5 operation in the event of a single disk failure is called a “degenerate state” (see Japanese Laid-open Patent Publication No. 2002-297322, for example). A disk array implementing RAID 5 is ordinarily provided with a spare disk drive. Once the disk array has transitioned to the degenerate state, operations are begun to restore the data on the failed disk by using the data recorded on the healthy disks. The data on the failed disk is then restored to the spare disk drive (see Japanese Laid-open Patent Publication No. 2002-297322, for example). In order to automatically recover data, it is typical for the spare disk drive to be used as a hot spare (i.e., a hot spare disk drive). In other words, the spare disk drive is typically kept powered on in a standby state (see Japanese Laid-open Patent Publication No. 2003-108316, for example).
RAID 6 is an extension of RAID 5 to increase fault tolerance. With RAID 6, two types of parity are generated when writing data to multiple disks in a distributed manner by striping. These two types of parity are respectively written to separate disks. Since two types of parity are written to separate disks in this way for distributed data protection, RAID 6 requires two redundant disks. Two techniques for generating the two types of parity in RAID 6 have been proposed: the 2D-XOR technique, and the P+Q technique. Primarily, the P+Q technique is implemented in disk arrays. In the P+Q technique, two parities called P parity and Q parity are generated and used. P parity is the same parity as that used in RAID 5. Q parity is generated by a different algorithm from P parity. With RAID 6, the two parities P and Q are respectively recorded to separate disks for distributed data protection. In so doing, it is possible to continue operation even if two disks fail at the same time. In other words, even if two disks fail at the same time, the data stored on the remaining, normally-operating disks can be used, and the data that was stored on the two failed disks can be restored. Similarly to RAID 5, the restoration of data on a failed disk in RAID 6 typically involves the use of a disk called a hot spare.
Typically, the embodiment of a particular RAID configuration is called a RAID group. In addition, the term “redundancy” is used to refer to the number of parity blocks inside a single stripe in a RAID configuration wherein data is recorded to multiple disks in a distributed manner by striping, such as in RAID 0, RAID 5, or RAID 6 (see Japanese Laid-open Patent Publication No. 7-306758, for example). According to this definition, the redundancy is “0” for a RAID group in a RAID 0 configuration. The redundancy is “1” for a RAID group in a RAID 5 configuration. The redundancy is “2” for a RAID group in a RAID 6 configuration.
Data recovery methods of the related art will now be described for the case where two disks have failed in a RAID 6 disk array. When two disks fail in a RAID 6 disk array of the related art, operation continues in the degenerate state, and the data on the two failed disks is restored to two hot spares. The data restoration in this case is conducted according to one of the following methods (1) and (2).
(1) The data on the two failed disks is restored to a hot spare in order, one disk at a time. In this case, one hot spare is used to restore the data on one of the failed disks.
(2) Restoration of the data on the two failed drives is conducted in parallel, with the data on the two failed drives taken together and then respectively restored to separate hot spares. In this case, one hot spare is used to restore the data on one of the failed disks, similarly to Method 1. However, by simultaneously using the two hot spares in parallel, the restoration of the data on the two failed disks is simultaneously conducted in parallel.
FIGS. 13A and 13B are schematic diagrams illustrating a data restoration method of the related art in the case where two disks have failed in a RAID 6 disk array. In FIGS. 13A and 13B, five disks 1001d-1 to 1001d-5 constitute a RAID 6 disk array. FIGS. 13A and 13B illustrates an example wherein the two of the five disks (1001d-3 and 1001d-4) fail, and the data on the disks 1001d-3 and 1001d-4 is respectively restored to the hot spares 1002s-1 and 1002s-2. According to the related art, when the two disks 1001d-3 and 1001d-4 fail at nearly the same time, restoration in a RAID 6 configuration is conducted according to one of the following methods. The data on the disk 1001d-3 may be first restored to the hot spare 1002s-1, and then the data on the disk 1001d-4 may be subsequently restored to the hot spare 1002s-2 (Method 1 discussed above). Alternatively, a process to restore the data on the disk 1001d-3 to the hot spare 1002s-1 may be conducted in parallel with a process to restore the data on the disk 100d-4 to the hot spare 1002s-2 (Method 2 discussed above).
Meanwhile, there exists a method for restoring data on a failed disk in a RAID 1 or a RAID 5 disk array by utilizing a spare parity group provided with a number of hot spares (i.e., spare drives) equal to the number of disks constituting the RAID array (see Japanese Laid-open Patent Publication No. 2003-108316, for example). There also exists a method that realizes dynamic sparing of data on two failure-prone disks (i.e., disk drives) by splitting and copying the data to two hot spares (i.e., spare disks) (see Japanese Laid-open Patent Publication No. 2005-122338, for example). Herein, dynamic sparing refers to a technique of predicting the failure probability of disks (i.e., disk drives), and then copying the data on failure-prone disks to hot spares (i.e., spare disk drives) before failure occurs. Additionally, there exists technology wherein, when a failure occurs in a disk constituting part of a disk array, data restored from the failed disk is distributed and stored on multiple spare disks by striping (see Japanese Laid-open Patent Publication No. 2008-40687, for example).
With RAID 6, when two disks fail, the data on the two failed disks is restored according to the methods described above. However, if an additional failure occurs on another disk while the restoration of data on one of the failed disks is still incomplete, data will be lost. This is because data restoration becomes impossible in RAID 6 if three or more disks fail. Consequently, when two disks have failed in RAID 6, it is necessary to quickly restore the data on one of the failed disks to a hot spare.