The present invention relates to a storage system and a communications path control method for a storage system.
The storage system comprises at least one or more host computer (hereinafter, called “host”) and storage device, and the host and storage device are connected by means of communications cables, switches, and the like. This storage device provides a storage region based on a RAID (Redundant Arrays of Inexpensive Disks) system, wherein disk drives, such as hard disk drives, or the like, are arranged in an array configuration. The host accesses the logical storage region provided by the storage device and reads or writes data.
In the storage system, in order to achieve high availability, data is stored in a redundant fashion and a plurality of communications paths are prepared. A storage system is also known wherein, if a failure has occurred in a communications path, that failure is detected and the path is automatically switched to a spare communications path (Japanese Patent Laid-open No. 2002-278909).
The technology in the reference patent uses redundant management paths in order to manage the status of the fabric switches and storage devices, and it switches path automatically. It does not control the communications path of the I/O network used to input and output data, but rather, switches to a spare management path when a failure is detected in the management path.
However, there are many different types of failures, and these can be divided broadly into two types of failures. One type of failure is a continuous failure which occurs over a relatively long period of time (a solid failure). The other type of failure is a temporary or intermittent failure which occurs over a relatively short period of time.
For example, in the event of a solid failure, such as a device fault or a cable disconnection, communications will be disabled over a long period of time, and therefore it is possible to switch to another communications path that is functioning normally. However, in the event of an intermittent failure, since communications functions are recovered after a short period of time, there is no particular need to switch to another communications path and it is possible to wait until the functions are recovered. If the path is switched each time there is an intermittent failure, which is a temporary occurrence, then path switching will be performed frequently and the performance of the storage system as a whole will decline.
However, even in the case of intermittent failures, if that intermittent failure continues for a long period of time, for example, then the host is not able to perform data input or output normally, with respect to the storage device, whilst the intermittent failure continues. Therefore, in some cases, the information processing services supplied to client terminals by the host may become affected.
Furthermore, if the communications ports of the storage device, or the like, are shared by a plurality of hosts, then if an intermittent failure occurs between the storage device and one of the hosts sharing a port, this may also affect the storage services provided to the other hosts that share that port.
In the conventional technology described in the reference patent, only solid failures are taken into account, and no consideration is given to intermittent failures where the path is not switched. Therefore, it is not able to respond to the problem of cases where an intermittent failure continues for a long period of time, or cases where the ports of a storage device are shared.