1. Field of the Invention
The present invention relates to a computer system including a plurality of computers and executing a plurality of services (works). More particularly, the present invention relates to a computer system and method for service load distributing in asymmetrical resource environments.
2. Description of the Related Art
Server load distributing systems are well known in which service execution requests issued from a number of client terminals are distributed to a plurality of computers to efficiently process the requests. Server load distributing systems of this type are described in, for example, Rajkumar Buyya, “High Performance Cluster Computing: Architecture and Systems (Vol. 1)”, Prentice-Hall, Inc., 1999, pp. 340-363 and in Tony Bourke, “Server Load Balancing”, O'Relly & Associates, Inc., pp. 3-31, December 2001. These server load distributing systems generally comprise a plurality of sever computers having symmetrical (uniform) resource environments and a load distributing unit. The load distributing unit receives a request to execute a service from a client terminal via a network (external network). Upon receiving the request, the load distributing unit determines which one of the server computers should execute the service designated by the client terminal. Selection is performed to avoid concentration of load on a particular server computer. That is, the load distributing unit distributes the execution of services of the same type to a plurality of computers.
In general, the server load distributing system employs one of the following methods to determine which computer should execute a service, i.e., to schedule services: (1) round-robin scheduling, (2) weighted round-robin scheduling, (3) a minimum connection method and (4) a fastest method. Round-robin scheduling is a method for uniformly selecting each server computer in a certain order. Weighted round-robin scheduling is a method based on round-robin scheduling, in which the frequency of selection of each server computer is determined in accordance with the capacity of each server computer. Accordingly, in weighted round-robin scheduling, a weight (selection frequency) corresponding to its capacity is assigned to each computer. The minimum connection method is for selecting a computer that has been connected a minimum number of times (for a minimum session) so far. The fastest method is for selecting a computer of a fastest response at the present stage.
The load distributing unit determines which server computer should execute a service, using one of the above methods (1) to (4). Subsequently, the load distributing unit sends a request to execute the service, issued from a client computer, to the selected server computer via a network (internal network). Upon receiving the request, the selected server computer executes the service, and sends a response to the load distributing unit. The load distributing unit returns the response from the server computer to the client terminal, i.e., the request issuer.
The load distributing unit monitors a response from each server computer. The load distributing unit detects a timeout that occurs when no response is returned from a server computer even after a predetermined time elapses. When detecting it, the load distributing unit determines that a failure has occurred in the server computer. The server computer failure includes a failure in a server computer itself, and a failure related to execution of a service by a server computer. When the load distributing unit detects a failure in a server computer, it does not allocate a service to the server computer, thereby realizing a pared-down operation of the system.
On the other hand, a computer system called a cluster system has come to be available, as is disclosed in “Cluster Software” by Tetsuo Kaneko and Ryoya Mori in Toshiba Review, Vol. 54, No. 12 (1999), pp. 18-21. In general, cluster systems comprise a plurality of computers having asymmetrical resource environments. In cluster systems, services different in function (i.e., different types of services) are allocated to a plurality of computers having asymmetrical resource environments. This allocation is beforehand closely planned by a user. Computers in a cluster system access each other via a network to detect any failure in computers currently executing services. Upon detection of a failure, the cluster system executes re-scheduling (fail-over), i.e., reallocates, to another computer, the service that is being executed by the computer from which the failure has been detected. This can reduce the service (work) interruption time, thereby realizing high availability (server operation rate, business execution rate) called “HA”. This type of cluster system is called an “HA cluster system”.
In general, a cluster system re-allocates a service to a standby computer. In this case, the loads on computers are not considered for scheduling services. Further, cluster systems of a static ticket type are also well known. In cluster systems of this type, a user sets a processing capacity (ticket) for each computer in the cluster system. Further, a processing capacity (ticket) needed for executing a service is set in units of services. Cluster systems of a static ticket type perform control, by setting a ticket, so as not to allocate, to a particular computer, services that exceeds the processing capacity of the computer.
As described above, the conventional server load distributing systems can perform dynamic load distributing to a plurality of server computers having symmetrical resource environments. However, the conventional server load distributing systems cannot perform dynamic load distributing to a plurality of server computers having complex asymmetrical resource environments, i.e., cannot perform reliable control of execution of services that operate in complex asymmetrical resource environments. Furthermore, the conventional server load distributing systems cannot promptly detect a failure in a computer since they perform failure detection upon timeout of a response from the computer.
On the other hand, in conventional cluster systems that have asymmetrical resource environments, load distributing is realized by user's close planning of functional load distributing. Alternatively, it is realized by a static ticket system in which a predetermined ticket is allocated in units of services. Accordingly, conventional cluster systems having asymmetrical resource environments cannot perform dynamic load distributing. Further, in the static ticket system, service allocation that is not suitable for the present status of loading may be performed.