A disk array system having a plurality of disk devices and a disk controller employs a RAID (Redundant Array of Inexpensive Disks) mechanism so as to prevent a data loss caused by a disk failure and to further enhance a processing performance. A system that employs the RAID mechanism is called a RAID system.
The RAID system breaks up data to the plural disk devices so as to ensure redundancy, except for “RAID 0” though.
If one of the disk devices forming a RAID group is disabled because of a failure, etc., and the redundancy is lost, a rebuilding process is performed such that the disabled disk device is replaced by a spare disk device and data is rebuilt on the spare disk device so as to restore redundancy.
The rebuilding process is performed so as to rebuild the data included in the disabled disk device in such a way that processes for reading data from a normal disk device for every certain processing unit and writing restored data to a Hot Spare Disk, i.e., the spare disk device, are repeated.
Incidentally, if troubles are detected in same blocks of disk drives of the number of more than a degree of redundancy during the process for reading data from a normal disk device, it becomes impossible to read data stored in the relevant block.
So as to continue the rebuilding process even in such a situation and explicitly indicate a data loss in the relevant block, the rebuilding process is continued after BAD data corresponding to the disk device of the detected trouble and the Hot Spare Disk is created and written to the relevant block.
The above BAD data is to explicitly indicate that the data in the relevant block on a volume (described later) is disabled because of some factor. Content of the BAD data is, e.g., in a state where data different from data at the beginning is written. If a host computer makes a request to read data and BAD data is included in a requested area, a BAD data reply is sent back to the host computer so that the host computer is aware of the failed request to read data. Thus, erroneous data is never used as it is.
Refer to Japanese Laid-open Patent Publication No. 2008-134987 and No. 11-510292.
A volume included in a RAID group is classified as an ordinary volume or a storage pool.
FIGS. 15A and 15B illustrates an ordinary volume and a storage pool.
If logical volumes 810 and 820 to be perceived by a host computer are made in the ordinary volume 800, a real area as large in capacity as the logical volumes 810 and 820 to be made is allocated onto the RAID group.
Each of real areas 800a and 800b in the ordinary volume 800 illustrated in FIG. 15A is an object area of a formatting process, a rebuilding process, etc. to be performed.
As illustrated in FIG. 15A, the real area 800a in the ordinary volume 800 is allocated to the logical volume 810 of a logical volume name of Vol#0. Further, the real area 800b in the ordinary volume 800 is allocated to the logical volume 820 of a logical volume name of Vol#1.
In the ordinary volume 800, as described above, the real areas to be used as the logical volumes 810 and 820 equal the real areas corresponding to the areas 800a and 800b perceived by the host computer.
Thus, it is enough to perform the formatting process only for a relevant area (perceived by the host computer) on a real area so as to make a logical volume, and to perform the rebuilding process described earlier only for the relevant area.
Meanwhile, in the storage pool 900, no real area is allocated when logical volumes 910 and 920 are made. At a time when the host computer practically makes an I/O request, a real area is allocated each time only to an I/O-requested area.
Thus, the whole storage pool 900 is made an object area of the formatting process, the rebuilding process, etc.
As illustrated in FIG. 15B, areas 900a, 900b and 900c into which the storage pool 900 is divided on a block-by-block basis of a certain size are allocated to the logical volume 910 of a logical volume name of Vol#2. Further, areas 900d, 900e and 900f into which the storage pool 900 is divided on a block-by-block basis of a certain size are allocated to the logical volume 920 of a logical volume name of Vol#3.
In the storage pool, differently from in the ordinary volume as described above, a real area to be used for a logical volume does not equal a real area which corresponds to an area perceived by the host computer. In other words, the storage pool includes an area that could possibly be used for a logical volume and is not perceived by the host computer yet.
If the rebuilding process is performed in the storage pool 900, BAD data can possibly occur in a real area that is not allocated yet in some cases during the rebuilding process.
In this case, upon allocating a new real area onto the logical volume, the host computer ends up allocating an area including the BAD data. Thus, there is a problem in that the area to be used since then ends up in a state where a trouble already exists in the area.