1. Field of the Invention
The present invention relates to a system and method for failover.
2. Description of the Related Art
In a cluster system, a plurality of computers (also called nodes) are loosely coupled to constitute a single cluster. Known types of cluster systems include for example load distributed systems and failover systems. In a failover cluster system, the system is provided with redundancy by using a plurality of computers. In the failover system, continuity of the business application service in regard to client computers is ensured by arranging that when one computer stops, its task is taken over by another computer. The one computer and the other computer are connected using a communication circuit (interconnection) such as a LAN and stoppage of a remote computer is monitored by “heartbeat” communication exchanged therewith.
Heartbeat communication is a technique of mutually monitoring for cessation of function by communication of prescribed signals at prescribed intervals between a plurality of computers. While heartbeat communication is being performed, the remote computer is deemed to be operating normally and failover (takeover of business services) is not performed. Contrariwise, if heartbeat communication is interrupted, it is concluded that the system of the remote computer is down and the business application services that were provided by the remote computer are taken over by the failover target computer.
From the point of view of the client computer that is using the business application service, the entire failover cluster appears as a single computer. The client computer is therefore not aware of which computer the business application service is being provided by even when processing is changed over from the live computer to the standby computer.
However, if failover is executed without giving any consideration to the operating condition of the failover target computer, the computer that takes over the business application service may itself become overloaded, resulting for example in a drop in response. In this connection, a technique is known whereby it may be arranged for the priority of the business application service to be altered in accordance with the operating condition of the failover target computer (Japanese Patent Application Laid-open No. H. 11-353292).
In the technique disclosed in this reference, transfer from the failover source to the failover target is arranged to be performed after first conducting an overall estimate of the total resources of the failover objects. The time taken to restart the business application service at the failover target computer therefore increases as the resources of the failover objects increase.
For example, when taking over a failover system, it is necessary to unmount the failing system at the failover source and to mount the failing system at the failover target. When performing unmounting or mounting, it is necessary to maintain the consistency of the data set by for example reflecting the data on the cache to the disk and reproducing the memory condition of the data in accordance with the update history file. The time required before the business application service can be restarted therefore increases as the number of filesystems to be transferred from the failover source to the failover target increases.