This invention relates to a method for failover, used in a computer system comprising plural servers which are booted by an external disk device, and more particularly to a method for failover for use with a computer system having plural logic partitions wherein only specified logic partitions are subjected to failover.
In a system where plural servers are booted by the use of an external disk array device, the disk array device can be connected with plural servers via fiber channel or fiber channel switches and therefore the boot disk of a particular server connected with the disk array device can be referred to by other servers. In such a configuration, when a failure occurs in a working server executing a task, the task can be taken over by starting a standby server through using the boot disk of the working server. Further, in this configuration, since there is no need of providing a standby server to be paired up with a working server, it is possible to hand over a task from an arbitrary working server to an arbitrary standby server, resulting in a decrease in the cost for initial installation. (Refer to United States Patent Application Publication No.: US2006/0143498 A1)
As a method of reducing the cost for initial installation is also known a technique wherein plural tasks are integrated by dividing a single server into plural logic partitions. For example, plural CPUs, memories, I/O devices, etc. are partitioned and allocated to individual logic partitions. The cost for initial installation can be further reduced by combining these techniques.
The Japanese patent document, JP-A-04-141744, discloses a technique wherein only a faulty operating system in the working host (computer) is taken over by the corresponding operating system in the standby host (computer) in the hot standby state in the hot standby system for a virtual computer.
With the conventional technique disclosed in JP-A-04-141744, employing the hot standby procedure, the standby host computer must be in operation, synchronized with the faulty working host computer. This incurs a problem relating to operating cost. Moreover, JP-A-04-141744 mentions the failure in the operating system, but not that in the hardware. The correspondence between the hardware failure location and the logic partition in a server depends on the configuration of the logic partition. In some cases, a single hardware may be related to plural logic partitions. Conventionally, in case of failure in hardware, the whole server including the faulty hardware has been usually subjected to failover, irrespective of the configuration of the logic partition.
In a computer system wherein a server, whose OS is booted by using an external disk array device, is divided into plural logic partitions and more than one independent virtual server is operated in the single server, when a boot disk is handed over from the working server to the standby server in case of a server failure, plural virtual servers operating in the working server are shut down so that the influence due to the failure becomes very considerable. This leads to a problem that the availability of the system as a whole becomes low. For example, even when a failure occurs in a CPU allocated to a particular logic partition, the other logic partitions must also be shut down, with the result that availability is lowered.