Conventionally, a system, which is constituted by a plurality of control units and which has redundancy based on redundant internal processing functions, is known. Such a system includes a monitoring apparatus that monitors operational states of redundant configurations of all constituent units and that controls the start and end of operations. An example of such a system with redundancy includes a virtual tape drive. A group of hierarchically connected data processing units is duplexed to form physical redundancy in the virtual tape drive. When a control unit in the duplexed group detects an abnormality in an operational response of a lower unit as a control target, the control unit shuts down the command issue or communication connection to the lower unit in which the abnormality is detected. After the shutdown, the control unit switches a connection path to another redundant group (standby system) and replaces processing with the standby system to continue the operation. The control unit waits for the termination based on a termination command of the lower unit or the self-termination before switching the connection path from the terminated lower unit to the standby system.
An RAS (Reliability, Availability, and Serviceability) automatic test system that automatically performs an RAS test of an apparatus is disclosed, for example, in Japanese Patent Laid-Open No. 11-53213.
However, when there is a failure in a lower unit, instead of reacting to a termination command from a control unit that has detected the abnormality, the system may continue an operation in the presence of the abnormality. In such a case, the control unit waits for the termination of the lower unit with the abnormality before switching to the standby system. Therefore, the transition of the operation in process cannot be performed. As a result, there is a problem that switching to the standby system to replace the lower unit with the abnormality is impossible. Under these circumstances, operations cannot be continued, and the entire system terminates. This problem may occur in a variety of other systems constituted by a redundant apparatus, in addition to occurring in virtual tape drives.
A case with such a problem will be specifically described with reference to a drawing. FIG. 14 is a diagram illustrating a hierarchical structure of a monitored apparatus as a lower unit in a virtual tape drive.
As illustrated in FIG. 14, in the monitored apparatus, a BIOS operates on hardware, an OS and an I/O driver operate on the BIOS, and a kernel and an I/O control unit operate on the OS and the I/O driver. A basic processing program of the virtual tape drive operates on the kernel and the I/O control unit, and a functional process control program operates on the basic processing program. A response control program operates on the functional process control program. In the controlled apparatus with such a hierarchical structure, the response transmissions to an upper host apparatus, a monitoring apparatus, and another monitored apparatus are performed in different levels. Specifically, the response control program performs command response transmission to the upper host apparatus, the functional process control program performs status response transmission to the monitoring apparatus, and the I/O control unit performs survival check response transmission to the other monitored apparatus. In such a monitored apparatus, for example, if the functional process control program is hung up, the monitoring apparatus cannot perform the status response transmission to the monitored apparatus. However, in the hierarchical structure of the monitored apparatus, the I/O control unit below the functional process control program is not affected by the hanging, and the I/O control unit automatically returns a response to the survival check from the other monitored apparatus. As with the I/O control unit, the basic process control program can also be operated without being affected by the hanging. In such a case, the monitored apparatus returns a response to the survival check and continues to operate, although there is an abnormality in the functional process control program and the response control program above the functional process control program. Since the level that performs the status response transmission to the monitoring apparatus is hung up, the monitoring apparatus cannot terminate the monitored apparatus. As a result, switching to the standby system to replace the monitored apparatus is impossible. More specifically, if the operation of a lower unit with abnormality does not terminate for some reason, the redundancy arranged in preparation for abnormalities is not effective. The problem may occur not only in the virtual tape drive, but also in any system constituted by a redundant apparatus.