This invention relates to a technology of recovering a failure in a cluster system including an active system computer and a standby system computer.
Conventionally, in a cluster system in which processing data is held in a nonvolatile shared disk and which includes an active system computer and a standby computer, when a failure has occurred in the process of the active system, recovery from the failure is performed by restarting the process or switching to a standby system.
In a cluster system using a volatile memory instead of a nonvolatile shared disk to improve the processing performance, when a process failure has occurred in the active system, the recovery processing cannot be performed because data is lost. Therefore, as recovery means in a case where a failure has occurred in the process of the active system, there is disclosed a technology in which a copy of data necessary for restart is transferred to another computer, and when the process is restarted, data copied in the another computer is used to perform the restart (refer to JP 09-168015 A). According to the technology disclosed in JP 09-168015 A, in order to copy data, the computer that transfers the data and the computer to which the data is transferred are circularly disposed, and the data is duplicated in all the computers.
However, according to the technology disclosed in JP 09-168015 A, because the data is merely duplicately protected, the recovery processing cannot be executed, if a failure has occurred in the computer where the data is copied, before the completion of the process restart.
Moreover, because the process restart by the same system is invariably executed and data transfer from another system is attempted when a process failure has occurred in the active system, there is a possibility that the processing time may become longer compared with system switching to the standby system.