1. Field of the Invention
The present invention relates to a disk array system having a plurality of disk drives and a method of avoiding failure of the disk array system.
2. Description of the Related Art
The disk array system, including a number of disk drives in an array, is configured based on RAID (Redundant Array of Independent Inexpensive Disks). A logical volume, which is a logical storage area, is formed on a physical storage area possessed by each disk device. A host computer can read and write desired data by issuing a write command or a read command of a predetermined format to the disk array system.
Various defensive measures are taken for the disk array system in order to prevent loss of data stored in the disk drive. One is an employment of RAID configuration. For example, by employing a redundant storage configuration known as RAID levels 1 to 6 in the disk array system, possibility of data loss decreases. In addition, in the disk array system, for example, it is possible to store the identical data into a pair of logical volumes; a primary volume and a secondary volume; by duplicating the logical volume in the RAID configuration. Alternatively, as known so-called as a disaster recovery, there is a case where data copy is stored to a remote site located far away from a local site, considering an inadvertent situation such as natural disaster and the like. Data stored in the disk array system is regularly stored in a backup device such as a tape drive.
In addition, in the disk array system, duplication of physical structure is also employed. For example, the disk array system is multiplexed by providing a plurality of main units such as host interface circuits for performing data communication with the host computer or a lower level interface circuit for performing data communication with each disk drive. There are also provided a plurality of paths for connecting these main units and power sources for supplying a power to these main units.
In addition to these units, the disk array system may be provided with one or more spare disk drives. When any failure occurred in the disk drive in which data is stored, the data stored in the faulty disk drive is copied in the spare disk. For example, by executing inverse operation based on data and parity stored dispersedly in other disk drive, the data in the faulty disk drive is recovered (JP-A-7-146760). Subsequently, the faulty disk drive is taken out, and replaced with a new disk drive or a spare disk drive.
In the related art, when a failure occurred in the disk drive, data stored in the faulty disk drive is recovered based on data and parity stored in another normal disk drive. In the related art, recovered data is then stored in the spare disk drive. In this manner, in the related art, data copy to the spare disk drive is not performed until a failure is actually occurred in a certain disk drive. Therefore, timing to start data copy to the spare disk drive is delayed. In addition, since data is recovered from a normal disk drive, it takes a long time to recover the data, and it also takes a long time until data copy is completed.
In addition, when any failure occurred successively in the part of another normal disk drive, data required for inverse operation cannot be obtained, and thus data of the faulty disk drive cannot be recovered. Even with the normal disk drive, when read and write operation is repeated, possibility of occurrence of partial failure increases. When two or more parts of information (data, parity) cannot be read, data cannot be recovered by inverse operation and thus unrecoverable data will be lost.