The present invention relates to a computer system having a currently-active system computer that runs an application and a standby system computer; and, more particularly, the present invention relates to a computer system that performs a failover control, in which when a malfunction occurs in the program of a computer or an operating system which is running an application, another computer is allowed to take over the application being run.
In an application system requiring malfunction tolerance capability, reliability can be secured by using a computer having a clustering configuration that includes a currently-active system server computer that executes a data process by a plurality of server computers and a standby system server computer that takes over the data process in case a malfunction occurs in the currently-active system computer. In the application in which data is stored in a disk as in database (DB), data is taken over from the currently-active system server computer and the standby system server computer using an accessible shared disk, and the process is continued by the standby system server computer. Accordingly, the I/O processing for synchronously writing data in the disk is required, and system performance is determined by the I/O processing performance.
In recent years, in the application system used over a wide range, system performance beyond the performance that is determined by the above-described I/O performance is required in many cases. To cope with such a request, an in-memory application system in which data is stored only in a memory and the system performance is improved by eliminating synchronous I/O processing to a disk device is appearing on the scene. In this in-memory application system, data stored in the memory cannot be directly shared with the standby system server computer. Therefore, for example, as in an in-memory DB, in the application requiring the malfunction tolerance capability such that the data stored in the memory is not allowed to be lost by a malfunction, duplicates of the data in the currently-active system server computer is stored in the memory of the standby system server computer through the communication from the currently-active system server to the standby system server computer, and thereby, the data is required to be redundantized. As one example of the above in-memory application system to which the malfunction tolerance capability is considered, there is used a memory DB system disclosed in JP-A-2005-293315 corresponding to US2005/0229022. In JP-A-2005-293315, disclosed is a technology in which the currently-active system computer writes in a shared memory in the standby system server computer the data updated by database stored in the currently-active system server computer to thereby duplicate the data in the standby system server computer as well as to thereby assure the data at the time of malfunction.
In this application system requiring the malfunction tolerance capability, there is used a method in which a server computer (malfunctioning server computer) in which a malfunction occurs is reset by a normal server computer in which the malfunction is detected, and further, there is used a technology disclosed, for example, in JP-A-2006-11992 corresponding to US2005/0289390 and JP-A-2006-285810 corresponding to US2006/0224728. In JP-A-2006-11992, disclosed is a technology in which a reset timing by each standby system server computer is different from each other, and thereby, a reset competition is prevented with regard to the technology in which when a standby system server computer detects a malfunction of a currently-active system computer, the standby system server computer that detects the malfunction resets the currently-active system computer to thereby stop the currently-active system server computer as well as to thereby realize a failover. Also, in JP-A-2006-285810, disclosed is a technology in which a reset device receives a reset that is issued at the time of each detecting a malfunction by each standby system server computer and judges each reset competition to thereby prevent the reset competition with regard to the technology in which the failover is realized using the same reset.