The present invention relates to a technology to guarantee high reliability in operation of a plurality of controllers for input/output (I/O) devices in a computer system, and in particular, to a method of redundantly arranging controllers capable of transferring a process therebetween without intervention of the user and host systems when failure occurs in one of the controllers in an external storage subsystem adopting a Small Computer Systems Interface (SCSI) in which the controllers are arranged at least in a duplicated configuration and the controllers can be accessed from the host systems.
In a system configuration employing the SCSI in which a plurality of controllers and a storage shared between at least two controllers are connected by an interface cable in a daisy chain to the host systems, the plural controllers respectively have different port addresses such as SCSI-IDs. Ordinarily, these controllers process I/O requests designated according to pertinent port addresses specified by the host systems.
JP-A-4-364514 describes a system in which the controllers are arranged in a multiplex configuration such that I/O requests from a host apparatus to storages connected to the plural controllers are processed at a high speed. In such a conventional system, when failure occurs in one of the controllers, and when the host system alters the specification of the controller to execute the I/O request, it is possible that the I/O request is processed by a normal controller. However, in a system in which the host system and the plural controllers are connected to each other in a daisy chain, considerations have not been given to a procedure in which when failure occurs in a controller, the process is transferred to a normal controller for the execution thereof without intervention of the host system.
After issuing an I/O request to a controller, the host system ordinarily monitors termination of the I/O request by a timer in the host system. When the I/O is not terminated even when the monitor time predetermined by the host system lapses after the issuance of the I/O request, the host system assumes the state temporarily as an error. Conducting processes such as bus recovery process of an SCSI bus, the host system tries to re-issue the same I/O request with specification of the port address of the failed controller.
When the controller does not respond to the re-issued I/O request, the host system regards the state as a permanent error and hence does not thereafter issue any I/O request to the failed controller. Upon failure of a controller in the conventional system, when the host system recognizes the permanent error, the data process thereof is interrupted. Therefore, even when there are disposed a plurality of controllers, user intervention is required to continuously execute the data process of the host system when failure occurs in the pertinent controller.
Furthermore, when there are disposed a plurality of host systems, and when a controller fails and enters a hang-up situation with the bus occupied by the failed controller, another data process being executed between another host system and another controller is also interrupted. User intervention is also required to recover the interrupted data process.