(1) Field of the Invention
The present invention relates to a disk array system and a failure recovering control method, further detailedly relates to a disk array system of a redundant array of inexpensive disks (RAID) type which forms one logical group by a plurality of disk drives including a redundant disk drive to prepare for the failure of any disk drive and a failure recovering control method.
(2) Description of the Related Art
A high-performance computer system is provided with a secondary mass storage and reads and writes data required by a host system such as CPU from and to the secondary storage at any time. For the secondary storage, for example, a disk unit having a nonvolatile storage medium that enables random access such as a magnetic disk and an optical disk is general and recently, a disk array system composed of multiple small-sized disk drives (hereinafter called merely drives) to increase storage capacity is mainstream.
In the disk array system, a RAID system in which one logical group is formed by a plurality of drives including at least one redundant drive to prepare for the failure of any drive for each logical group is adopted.
RAID has standardized some levels. For example, in a disk array system of RAID 1 (a level 1), a redundant drive or a spare drive is prepared for each drive for storing data and, by writing the same data to the two drives in parallel usually, required data can be read out from the spare drive paired with the data drive even if failure occurs in any data drive.
At RAID 3 (a level 3), one logical group which may be also called a parity group or a RAID group is formed by (N+1) pieces of drives (N≧2), one of them is used for a redundant drive for storing error correction information (hereinafter represented by parity) and the rest are used for drives for storing data. In this specification, error correction information is represented by parity, however, it is clear that another information except parity can be applied to error correction information generated in each logical group.
In the disk array system of RAID 3, when a request to write a data block is issued from a host system, the data block to be written is divided into a plurality of data sub-blocks of fixed length (for example, 1-byte length) and these data sub-blocks are sequentially distributed and stored to the N pieces of data storing drives. In the redundant drive, error correction information generated based upon the N pieces of data sub-blocks belonging to the same logical group and having the same address in the respective data storing device is stored. When a request to read a data block is issued from the host system, an original data block is reconstituted by reading out data sub-blocks in parallel from N pieces of data storing drives and joining these data sub-blocks in predetermined order.
At RAID 4 (a level 4), one logical group is also formed by (N+1) pieces of drives (N≧2), one of them is used for a redundant drive for storing error correction information and the rest are used for data storing drives. However, in a disk array system of RAID 4, when a request to write a data block is issued from a host system, data is written in a way that a data block related to one request to write is stored in one of data storing drives and a data block related to the next request to write is stored in another data storing drive, for example. Therefore, in a redundant drive, error correction information generated based upon data divisions which have the same storing address in the respective data storing drives and belong to separate data blocks is stored.
At RAID 5 (a level 5), as at the level 4, data is written in a data block unit related to one request to write. However, as an area for storing error correction information is not fixedly allocated to a specific disk drive as at the level 4 but is dispersively allocated to plural (N+1) disk drives forming a logical group.
In the disk array systems of RAID 3 to RAID 5, when failure occurs in any drive, data or error correction information (for example, parity data) held in a faulty drive can be regenerated based upon data read out from the other drives that belong to the same logical group as the faulty drive.
In the above-mentioned disk array system, in order to increase memory capacity and to miniaturize body size, it is required to mount multiple drives as many as possible in small space. Generally, system configuration that a plurality of control boards each mounting control LSI chips thereon and multiple drive boards each mounting a plurality of drives thereon are inserted into connectors arranged in parallel on a mother board, and each drive is connected to a disk channel wiring on the mother board via individual wiring on the drive board is adopted.
In this configuration, in order to detach only a faulty drive from a drive board to replace with a new drive in the occurrence of failure in the drive, free space sufficient for replacing the faulty drive is necessary between drive boards adjacent to each other. As a result, the packaging density of drive boards on the mother board is deteriorated.
For prior art in view of this problem, for example, Japanese published unexamined patent Publication No. Hei7-230362 (a patent document 1) proposes a disk array system that multiple drive boards are mounted on a mother board at high density, a drive board (a faulty board) on which a faulty drive is mounted is detached from the mother board when failure occurs in the drive, the faulty drive is replaced with a normal drive outside the system, and the drive board with the replaced component is connected to the mother board again.
According to the configuration, not only the faulty drive but a plurality of normal drives mounted on the faulty board are detached from the disk array system until the faulty component has been replaced.
For that reason, in the patent document 1, each logical group is formed by (N+1) pieces of drives mounted on different drive boards. When a read request to read out data stored in the faulty drive or a drives made absent by the detachment of the faulty board is issued from a host system, data blocks are read out from the other plural drives that belong to the same logical group as the faulty drive or the absent drive and required data is regenerated based upon the data blocks.
Besides, the patent document 1 proposes to store in a cache memory provided to a disk controller, write data requested to write in a faulty drive before a faulty board is detached and write data requested to write in the faulty drive or a plurality of normal drives which are made absent after the faulty board is detached. The data stored in the cache memory is written into the corresponding drives on the recovered board when the drive board having been replaced the faulty component with a normal component is connected to the mother board again. Lost data made unreadable by the detachment of the faulty board is regenerated based upon data read out from the rest plural drives that belong to the same logical group as the detached drive when the drive board is connected again and is written into a substitution drive on the recovered board.
In the above-mentioned patent document 1, it is further proposed to prepare a spare drive corresponding to each logical group, to temporarily store data, which is requested from the host system to write in the absent drive, in the spare drive in place of the cache memory, and to copy the data from the spare drive to the corresponding drive on the recovered board when the drive board having been replaced the faulty component with a normal component is connected to the mother board again. Besides, when stored data in the spare drive in place of the faulty drive becomes large quantities, it is also proposed that the spare drive may be continuously utilized as a normal drive by regenerating the lost data of the faulty drive based upon data read out from the other plural drives that belong to the same logical group as the spare drive or the faulty drive when the drive board having been replaced the faulty component with a normal component is connected again and writing the regenerated data to the spare drive.
It can be said that the disk array system proposed in the patent document 1 has hardware configuration suitable for miniaturizing body size and increasing memory capacity, however, when a data read request is issued from the host system while the faulty board is detached, operation for regenerating data based on the logical group is required for not only the faulty drive but for a plurality of normal drives made absent, and such a problem that a response to a data read request is delayed is caused while the faulty board is detached from the disk array.
Further, in the above-mentioned prior art, as the operation for transferring data from the cache memory or the spare drive to the drive on the recovered board and the operation for regenerating lost data in the faulty drive are executed when the drive board having replaced the faulty component is inserted to the mother board again, the return to a normal state of the disk array system is delayed.