1. Field of the Invention
The present invention relates to a standby system calculator, a cluster system, and a method of providing a service, and recording medium for taking over a service being executed by a currently-used system calculator when failure occurs in the currently-used system calculator.
2. Description of the Related Art
In order to shorten the time during which a service provided by a system is stopped, a cluster system including a plurality of calculators capable of providing the same service is known.
In the cluster system, when failure occurs in a currently-used system calculator that executes an application program and that provides a service, a standby system calculator takes over the service by taking over execution of the application program. The period of time during which the service is stopped can be shortened by the take-over (failover) of the service.
When a heartbeat (heartbeat packet), which is output from the currently-used system calculator, is disrupted, the standby system calculator judges whether or not failure has occurred in the currently-used system calculator. Specifically, the standby system calculator judges whether the cause of the disruption of the heartbeat is communication failure (network partition) or failure in the currently-used system calculator. In order to judge whether the cause of the disruption of the heartbeat is communication failure or failure in the currently-used system calculator, a plurality of physically-independent communication lines between the calculators constituting the cluster system is required.
If the standby system calculator erroneously judges the cause of the disruption of the heartbeat as failure in the currently-used system calculator even when the cause of the disruption of the heartbeat is communication failure, both the currently-used system calculator and the standby system calculator will provide services.
In this case, the state (split brain), in which the consistency between the data owned by the currently-used system calculator (data updated based on execution of the service) and the data owned by the standby system calculator (data updated based on execution of the service) is not achieved, is generated.
Patent Literature 1 (JP2006-146299) describes a split brain recovery method of executing a recovery process for solving the inconsistency of the data of each of a plurality of calculators when the disruption of the heartbeat is solved after occurrence of the split brain state.
In Patent Literature 1, a technique of solving the inconsistency of the data caused by the split brain is described, however, a technique for suppressing occurrence of the split brain state is not described.
As a method of suppressing occurrence of the split brain state, a method of providing a plurality of physically-independent communication lines between the calculators constituting the cluster system and highly accurately judging whether the cause of the disruption of the heartbeat is communication failure or failure in the currently-used system calculator is conceivable.
However, this method has a problem in which the plurality of physically-independent communication lines between the calculators constituting the cluster system is required to suppress the occurrence of the split brain state. This problem is particularly notable when the standby system calculator is installed at a location remote from the installation location of the currently-used system calculator as a countermeasure against disasters.
As another method of suppressing the occurrence of the split brain state, a method in which an operator confirms that the currently-used system calculator has stopped and then provides instructions, by a manual operation, to the standby system to initiate failover, is also conceivable.
However, this method has a problem in which the service is stopped from when the currently-used system calculator stops operating and until the operator instructs the failover.