1. Field of the Invention
The present invention relates to a storage system which connects a plurality of magnetic disk devices or other physical disk devices in a loop and an automatic restoration method for same upon a loop anomaly, and in particular relates to a storage system which disconnects and bypasses a faulty storage device in a loop and automatically restores the loop, and an automatic restoration method for same upon a loop anomaly.
2. Description of the Related Art
In storage equipment utilizing magnetic disks, magneto-optical disks, optical disks or other storage media, storage media is physically accessed at the request of a data processing device. When the data processing device uses a large quantity of data, a storage system having a plurality of storage units and control devices is utilized.
In such a storage system, redundant configurations are adopted in order to improve the reliability of stored data and enhance equipment reliability and to speed data transfers, FC_AL (Fibre Channel Arbitrated Loop) interfaces are used. A large number of storage devices are connected in such FC_AL loops. Consequently, when a fault occurs in a storage device in a loop, the entire loop is affected. Hence automated loop restoration technology, in which the storage device in which the fault has occurred is disconnected from the loop and the effect on the entire loop is eliminated, is necessary.
In the prior art, the method shown in FIG. 16 has been known as a method of disconnecting a storage device (magnetic disk device) in which a fault has occurred from an FC_AL loop and restoring the loop.
As shown in FIG. 16, each of a plurality of magnetic disk devices 160, 162, 164 is connected to a pair of fibre channel loops 106, 108 by fibre switches 130 to 134 and 140 to 144. One of the fibre channel loops 106 is connected to the device adapter 102 of a controller by a fibre channel connector 114; the other fibre channel loop 108 is connected to the device adapter 104 of the controller by the fibre channel connector 116.
Both device adapters 102 and 104 are connected to the centralize control module 100 of the controller. Hence the centralize control module 100 can access each of the magnetic disk devices 160, 162, 164 by both one route (route a) via the device adapter 102 and one fibre channel loop 106, and by another route (route b) via the device adapter 104 and other fibre channel loop 108.
Disconnection control portions 150, 152 are provided in the fibre channel loops 106, 108. One of the disconnection control portions 150 controls disconnect (bypassing) of each of the fibre switches 130, 132, 134 in the fibre channel loop 106, and the other disconnection control portion 152 controls disconnect (bypassing) of each of the fibre switches 140, 142, 144 in the other fibre channel loop 108.
In the prior art, as shown in FIG. 16, upon detecting that one of the fibre channel loops 106 cannot be accessed, the centralized control module 100 uses the disconnection control portion 150 to repeat an operation to check the loop 106 by bypassing, in succession, one magnetic disk device at a time (for example Japanese Patent Laid-open No. 2001-306262).
For example, first the fibre switch 130 on the port “a” side of magnetic disk device 160 is switched to the bypass state, the magnetic disk device 160 is disconnected from the fibre channel loop 106, and a diagnostic signal is passed from the device adapter 102 to the fibre channel loop 106 to check the loop.
Next, the fibre switch 130 on the port “a” side of the magnetic disk device 160 is connected to the loop, and then the fibre switch 132 on the port “a” side of the next magnetic disk device 162 is switched to the bypass state, to disconnect the magnetic disk device 162 from the fibre channel loop 106, and a diagnostic signal is passed from the device adapter 102 to the fibre channel loop 106 to check the loop.
Thereafter, each of the magnetic disk devices in the fibre channel loop 106 is disconnected from the loop in succession, and the loop is checked. When the anomalous magnetic disk device is identified by this procedure, the switch 132 on the port “a” side of the magnetic disk device (in FIG. 16, magnetic disk device 162) is disconnected. By this means, the loop 106 is made to function normally, and at the same time the magnetic disk device 162 can be accessed from port “b” on the side of the fibre channel loop 108.
Normally, several tens (for example, up to a maximum of 60) of magnetic disk devices are connected to a single fibre channel loop. Consequently when using a loop check method in which one disk at a time is bypassed, as in the technology of the prior art, the time required for automatic loop restoration is from several tens of seconds to several minutes approximately, so that the restoration time is lengthened. Because access to magnetic disk devices is halted during the automatic restoration operation, disk access times by a host become longer. Hence using the technology of the prior art, time is required for restoration processing when there is an anomaly in one loop, and the time required for disk access by a host is lengthened.
In order to reduce the host wait time, a method may be adopted in which even when an anomaly is detected in one loop, magnetic disk devices are accessed from the other loop, and only when anomalies are judged to have occurred in both loops is loop restoration processing begun. However, when using this method one loop cannot be used, so that processing performance is diminished, and to this extent the disk access times for hosts cannot be shortened.
Moreover, even when a loop is automatically restored, rebuild/copy-back and other RAID (Redundant Array of Independent Disks) restoration processing are begun, so that the disk access time for hosts is further lengthened.
Hence an object of this invention is to provide storage system and an automatic restoration method for same in the event of a loop anomaly, to quickly perform loop restoration processing when a loop anomaly occurs.
A further object of this invention is to provide storage system and an automatic restoration method for same in the event of an anomaly of a loop on one side, to quickly perform processing to restore the loop on that side.
Still a further object of this invention is to provide storage system and an automatic restoration method for same in the event of a loop anomaly, to quickly identify a storage device in which a fault has occurred, bypass the storage device, and perform automatic restoration.