1. Field of the Invention
The present invention relates to a method for deciding an optimum server to which applications are to be migrated in the event of a failure of a currently-active server.
2. Description of the Related Art
In recent years, more performance improvements and high availability have been increasingly achieved by clustering applications. In particular, cluster systems using a SAN (Storage Area Network) achieve continual management by failover and cascade failover in the event of failures.
With the SAN, it is easy for a plurality of servers to access the same storage that composes clusters. As a result, when a failure occurs in a server, the server can be switched flexibly to any one of the other servers.
For example, U.S. Pat. No. 5,852,724 describes a technology in which several spare servers are prepared for a plurality of servers, and when a failure occurs in a specific server, applications are migrated to one of the spare servers. Even when a spare server is not prepared, applications running on a server that experiences a failure can be operated on another server in operation. Also, there are products having a function to select one of a plurality of servers in operation which has a lower workload when applications are to be migrated from a failed server to another server in operation. For example, VERITAS Cluster Server (VERITAS White Paper, Managing Application Availability with Application Clustering and the VERITAS Cluster Server 2.0) uses such a function.
When an application is migrated to another server in operation, there may be an influence on the performance of applications running on the other server due to an increased workload. Even when such an influence is not a problem, there is still an issue as to whether the other server to which the application is migrated has a sufficient computer resource required by the application to be migrated.
There are also some examples that implement a workload distribution method of migrating applications to a server with the lightest workload. Moreover, there is also a conventional technology in which the workload of a server is guessed based on the response time and the like in communications in HTTP. However, even when an application is migrated to a server with a relatively short response time, it is not certain if an expected response time can be achieved when the application is actually executed on the server. Because applications generally have characteristics such as CPU-bound, I/O-bound and the like, guessing the workload simply based on the CPU busy rate and the response time in communications in HTTP does not guarantee if an expected response time can always be achieved when a target application is actually executed.
Furthermore, at the time of failover, there may be cases where it is more preferable to continue the services in the state of performance equal to that before occurrence of a failure rather than switching a failed server to a server with the lightest workload. For example, when the workload is distributed by a load balancer, the load balancer may detect a switching destination server with a greater performance as having a room of resources and may increase the amount of request allocations. However, such a situation may slow down responses to the users, and therefore is not desirous to many of the users. To begin with, it is obviously unnatural that the performance of an application improves with an event of failure as a momentum, and different countermeasures are necessary for failover and load distribution at the time of failures.
Generally speaking, in migrating applications operating on one server to another server in the event of a failure, it is considered that the fewer the changes in all aspects including the performance (except the availability) before and after the occurrence of the failure (thereby maintaining the invariability at its maximum before and after the occurrence of the failure), the more dominant in terms of safety. This is contrasted with the idea of load distribution that starts terminating the execution of applications as soon as possible. Moreover, when applications are to be migrated to another server, it is necessary to choose a server whose reliability is high (whose event probability of failures is low) as much as possible.
There is no conventional technology that achieves both performance invariability before and after occurrence of a failure and availability improvement after the failure. Especially, there are technical difficulties in the method for maintaining performance invariability before and after occurrence of a failure. This is because a server that was in operation before a failure and a server that is in operation after the failure may not be the same in various aspects such as the CPU performance, the number of CPUs, the I/P performance, and the like, and a destination server to which the application is migrated may already be running other applications, and therefore it is not possible to predict the performance of the destination server after the application is migrated.