1. Field of the Invention
The present invention relates to a RAID apparatus, and communication-connection monitoring method and program in which a communication timeout time is monitored when a plurality of processing devices hierarchically perform processes upon a process request from a host device and make a response and, particularly, to a RAID apparatus, and communication-connection monitoring method and program allowing a response so as not to cause an overrun of a timeout time of a channel connection of the host device even if a hierarchical process takes time.
2. Description of the Related Arts
Conventionally, a RAID apparatus for use as a storage device for a global server, an open server, or the like is formed of a plurality of control modules, channel adaptors, device adaptors, disk enclosures, and routers. The control modules each include a CPU, a cache, or the like to control the operation of the entire storage device. The channel adaptors are modules that connect the RAID apparatus and various servers, and use an interface, such as Fibre Channel or iSCSI. The device adaptors are modules that connect the control modules and drive enclosures having incorporated therein a plurality of magnetic disk drives together, and use the Fibre Channel interface. Furthermore, the routers are modules for high-speed connection among the control modules, the channel adaptors, and the device adaptors. In such a conventional RAID apparatus, a command of an input/output request issued through an interface connection by a channel of a server, which is a host device, is received, and then an input/output process is performed on a volume via a cache. Normally, a communication time of the interface connection between the channel and the RAID apparatus is monitored through an interface connection check ICC. If a predetermined time has elapsed with the channel and the RAID apparatus being in a connection state, an ICC timeout error is determined to forcefully separate, at the channel side, the connection with the RAID apparatus, determine abnormality of the RAID apparatus, and suppress subsequent input/output requests. A factor responsible for the occurrence of a timeout error in the interface connection check ICC at the channel side is intermodule communication (interdevice communication) when two control modules provided in the RAID apparatus hierarchically perform processes upon a process request from the channel, such communication taking time for process. Therefore, in the conventional RAID apparatus, as a timeout time for monitoring an interdevice communication for a hierarchical process by two control modules, a time shorter than the timeout time of the interface connection check ICC at the channel side is set to a timer value. When the hierarchical process by two control modules takes time, a timeout error of the interdevice communication in the two control modules is caused before a timeout error of the communication connection with the channel. Then, separation of the interface connection from the RAID apparatus side is requested of the channel, and then the interface connection is released before an error of the interface connection check ICC is determined.
FIG. 1 is a drawing for describing a communication-connection monitoring process in the conventional RAID apparatus. In FIG. 1, a channel 200 establishes at a time t1 an interface connection with a control module 204 provided in a RAID apparatus 202 for transmission of a process request 208. The control module 204 reads the process request 208, establishes an interdevice communication with the control module 206, and then transmits a process request 210 at a time t2. The control module 206 performs a process execution 215 corresponding to the process request 210, and returns at a time t4 a process response 212 indicative of a normal end. Upon reception of this, the control module 204 transfers at a time t6 a process response 214 to the channel 200. The channel 200 then releases the connection with the RAID apparatus 202. Here, the channel 200 sets at the time t1 a timeout time T1 of an interface connection check (ICC) 216 to a timer value to monitor whether a response from the RAID apparatus 202 comes within the timeout time T1. Also, in the control module 204, when the interdevice communication is started at the time t2, as an intermodule communication check 218, at timeout time T2 of the interdevice communication shorter than the timeout time T1 at the channel side is set to a timer value to monitor whether the process response 212 is obtained from the control module 206. In the case of FIG. 1, the hierarchical process by the control modules 204 and 206 has no delay, and neither an error resulting from a lapse of the timeout time T2 of the interdevice communication nor an error resulting from a lapse of timeout time T1 of the interface connection check ICC at the channel 200 side occurs, and the process is caused to normally end.
FIG. 2 depicts the case where the hierarchical process by the control module 206 takes time. In this case, an interdevice communication is established at the time t2 to start a process execution 215-1 of the control module 206, but this process takes time to cause an overrun of the timeout time T2 of the intermodule communication check 218 at a time t3, which is a time before a process response is to be issued at the time t4, thereby causing the occurrence of a timeout error. With a connection release request 220 being issued to the channel 200, the interface connection between the channel 200 and the RAID apparatus 202 is released, thereby preventing the occurrence of a timeout error due to an overrun of the timeout time T1 of the interface connection check 216. Furthermore, if a timeout error of the intermodule communication is determined at the time t3 to release the connection with the channel 200, the possibility of the occurrence of an error in the process execution 215-1 of the control module 206 is high. Therefore, the process request 210 issued at the time t2 is cancelled, and again a process request 222 having the same content is issued to the control module 206 with an interdevice communication being established, thereby performing a same process execution 215-2 and waiting for a process response from the control module 206. When a process response 212 is obtained at a time t5, the control module 204 requests at a time t6 an interface reconnection of the channel 200. Upon establishment of the connection, a process response 214 is transferred, thereby ending the series of processes. Here, even if a process response to the process request 210 is obtained from the control module 206 after the timeout error, the process request 210 issued at the time t2 is cancelled at the time of the timeout error, and therefore that process response is discarded.    [Patent Document 1] Japanese Patent Laid-Open Publication No. 2003-233514    [Patent Document 2] Japanese Patent Laid-Open Publication No. 07-006058
However, in such conventional monitoring of the interdevice communication with the timeout time T2 shorter than the timeout time T1 of the interface connection check at the channel side being set, if a process until the start of the interdevice communication takes time, a problem arises such that a timeout error of the interface connection check at the channel side will occur.
FIG. 3 depicts the case where the process until the start of the interdevice communication is delayed. In FIG. 3, the control module 204 receiving the process request 208 from the channel 200 at a time t1 has a process delay 224 due to some cause, and then at a time t2 after an elapsed time T, issues a process request 210 to the control module 206. Therefore, the timeout time T1 of the channel 200 expires at a time t3 before a time t4 when the timeout time T2 of the interdevice communication expires in the control module 204. Thus, a problem arises such that a timeout error 224 of the interface connection check 216 occurs, thereby separating the interface connection with the RAID apparatus 202, error-ending the process at the channel 200 side, and not allowing subsequent requests from or response to the RAID apparatus 202.