In a conventional cluster system composed of a plurality of service servers, generally a node failure detection method by a heart beat signal is adopted. In this method a heart beat packet is transmitted from each service server to other service servers via a dedicated interconnect LAN (local area network) and when no response packet is received from a specific service server for a certain time, the failure of the service server is detected.
However, the node failure detection method by a heart beat signal has the following problems.
(1) Misdetection
In a cluster system, even when a service process itself is normally performed, sometimes a heart beat signal is not normally transmitted/received due to the partial failure of an operating system (OS) and the like. In this case, the failure of a system state not directly related to a service is detected and even in a state where the service process can be actually continued, node switching occurs.
(2) Detection Time
The node failure detection method by a heart beat signal requires fairly much detection time. Then, if a timer is set short in order to shorten the detection time, the misdetection of (1) is promoted. Therefore, there is a high risk that unnecessary node switching occurs.
The following Patent document 1 relates to a cluster system for determining whether a process can be continued by using a service processor for monitoring failure occurrence in a node and Patent document 2 relates to a cluster system in which a management server collectively manages node information by an agent mounted on each node communicating with the management server.    Patent document 1: Japanese Laid-open Patent Publication No: 09-034852    Patent document 2: Japanese Laid-open Patent Publication No: 2004-334534