1. Field of the Invention
The present invention relates to a storage control apparatus, and a failure recovery method for a storage control apparatus.
2. Description of the Related Art
To handle large varieties and volumes of data, governments and other public agencies and offices, municipalities, companies, and educational institutions, for example, manage data using relatively large-scale storage control apparatuses. This storage control apparatus constitutes a storage area that comprises redundancy by virtue of redundant information (RAID: Redundant Array of Independent Disks), and stores data in this storage area (Japanese Patent Laid-open No. 10-149262).
In a storage control apparatus such as this, data is divided into prescribed sizes and respectively distributed and stored in a plurality of storage devices. Then, parity is calculated on the basis of the divided data, and this parity is stored in a storage device. Accordingly, should any one piece of data be lost, it is possible to reproduce (recover) the lost data on the basis of the other data and parity.
For example, when a failure occurs in a storage device, and it become impossible to read and write data, a correction copy is executed. Correction copy is a technique for restoring all data stored in a failed storage device on the basis of the data and parity in a normal storage device within a parity group (also called an ECC (Error Correcting Code) group, or RAID group), and storing all of this restored data in a spare storage device (Japanese Patent Laid-open No. 11-191037).
Furthermore, technology, which is constituted such that the setup of various equipment in a storage system can be performed automatically based on policy that a user specifies in advance, is also known (Japanese Patent Laid-open No. 2003-303052).
In the prior art, when a failure occurs in a storage device, and the reading and writing of data becomes impossible, executing a correction copy transfers the data stored in the failed storage device to a spare storage device. When correction copy is complete, the spare storage device is used in place of the failed storage device. Then, the failed storage device is removed from the storage control apparatus and returned to the repair shop.
In the past, when a failure was detected in a storage device, the storage device in which the failure occurred was immediately detached from the storage control apparatus, and a spare storage device was used in its place. However, there are a variety of types of storage device failures, such as a physical failure and a logical failure, and there are also cases in which a storage device will recover to its normal state by simply restarting it. For example, when firmware hangup occurs inside a storage device, the storage device can most often be restored to its normal state by simply being restarted.
Even in cases when it is possible to recover from a failure by simply restarting the storage device, failure recovery still takes time because the storage device in which this failure occurred is isolated, and a spare storage device is used in its place. This is due to the fact that all the data stored in the failed storage device is restored via a correction copy, and this restored data must be stored on the spare storage device.
In a correction copy, the data, which is stored in a storage device in which a failure has occurred, is restored by reading out predetermined amounts of data and parity, respectively, from a normal storage device inside the parity group, and performing a logic operation based on this read-out data and parity. Then, this restored data is written to a spare storage device. This kind of processing, involving data and parity readouts, a logic operation, and a write to a spare storage device, must be executed repeatedly for all the data stored in a failed storage device. Therefore, failure recovery takes time, and also increases the load placed on the storage control apparatus.
Further, using a spare storage device each time there is a failure, from which recovery is possible by simply restarting the storage device, increases the frequency at which storage devices are replaced, thus adding to the operating and maintenance costs of the storage control apparatus.