1. Technical Field
This technique generally relates to a storage system having storage devices for storing data and control devices for controlling the storage devices. More specifically, the present invention relates to a storage system that allows access to each storage device even if a failure occurs in a communication path and also allows efficient system recovery.
2. Description of the Related Art
In systems in recent years, data used by a computing apparatus for various types of processing are stored on multiple HDDs (hard disk drives) included in a RAID (Redundant Arrays of Inexpensive Disks) to speed up the data access and improve the security of the data (e.g., refer to Japanese Laid-open Patent Application Publication No. 2004-348876 and Japanese Patent No. 3516689). The RAID apparatus typically has an arbitrated loop constituted by multiple DEs (disk enclosures) and high-order RAID controllers.
FIG. 12 is a block diagram of the configuration of a known RAID apparatus. As shown, a RAID apparatus 10 includes RAID controllers 20 and 30 and DEs 40 to 70, Each device is assigned a unique address called an “ALPA (arbitrated loop physical address)”.
For example, upon obtaining data to be stored from a computing apparatus (not shown), each of the RAID controllers 20 and 30 executes processing for allocating the obtained data to the DEs 40 to 70. Also, for example, in response to a data obtaining request from the computing apparatus, each of the RAID controllers 20 and 30 executes processing for obtaining the requested data from the DEs 40 to 70.
Each of the DEs 40 to 70 has a switch and is connected to multiple HDDs via the switch. For example, the DE 40 has a switch 40a and is connected to HDDs 41 to 43 via the switch 40a. Typically, cables are used to interconnect the DEs 40 and 70 in consideration of future expansion. The connection is generally accomplished by a cascade connection, which is also known as a concatenated connection.
However, the above-described known technology has a problem in that, when a failure such as a cable defect or a unit defect occurs in a communication path that reaches from the RAID controller to the DEs, the DE(s) subsequent to the failed DE becomes unusable.
FIG. 13 is a block diagram illustrating the problem of the known technology. For example, as shown in FIG. 13, when a failure occurs in the communication path between the DE 40 and the DE 50, the RAID controllers 20 and 30 cannot access the DEs 50 to 70. Also, when a failure occurs in the DE that is located adjacent to the RAID controller 20 and 30, the number of DEs that become unusable increases and the availability decreases significantly.
Although it is possible to provide the structure with a redundancy by duplicating cables between the DEs, the cable cost is inevitably doubled and an increased number of cables complicates the installation of the cables.
In addition, when a cable defect, a unit defect or the like occurs, any of the DEs 40 to 70 detects the fault without automatic isolation, the DEs 40 to 70 are then temporarily stopped, a portion in question is identified on the basis of history data and so on, and component replacement is performed to resume the operation. Such a procedure delays the recovery of the RAID apparatus and reduces the availability.
That is, the known technology has critical challenges to enabling access to the storage devices (i.e., HDDs) connected to each DE and enabling efficient recovery of the RAID apparatus, even if a failure occurs in a communication path in the RAID apparatus.