1. Technical Field
The present invention relates generally to an improved distributed data processing system and in particular to the method of handling requests in a distributed data processing system. Still more particularly, the present invention relates to a method and apparatus for load balancing the requests from clients in a distributed data processing system.
2. Description of Related Art
Over the last few years, a surge in the number of Internet users and server providers has occurred. The number of Internet users has been growing geometrically since the early 1900""s. This growth calls for capacity planning, performance, and management studies to properly handle the Internet traffic with the ultimate goal being to speed up users"" response time, or increase their file transfer throughout. Some particular file serving applications that have been receiving particular attention are the World Wide Web (WWW) and the File Transfer Protocol (FTP). One problem to be solved is how to serve the increasing number of users and their work-load demands within acceptable users"" performance criteria.
One solution is to make the server hardware run faster, but this is expensive. A cheaper solution is to provide a cluster of identical parallel servers to accommodate the large transaction rates of the requests generated by the users (the number of servers being dependent on these rates). The servers share the data and the network address; to the users, these servers appear as a single node. This solution, however, requires the assignment of each request to the right server. This arrangement means that new techniques to balance the load among the servers are needed. Special attention has been made to the case where the clients are only reading information from servers, such as for example, Web servers. The load balancing of the servers means that the servers should be as evenly loaded as possible at any given time. It is important to avoid assigning requests to a server that is busier than another one. This rule reduces unnecessary queueing time and thus will not increase the user""s response time. It will also reduce congestion at the servers and thus avoid any resource allocation problems that may arise.
Mechanisms presently available for load balancing the servers include the following schemes: (1) round robin; (2) forward the request to the server with the least number of requests in its queue; (3) forward the request to the server with the fastest response time; and (4) use a server agent to determine the actual load on each server.
The knowledge of the load at each server at any decision point is an important element. Techniques (1) and (2) above do not take into account such information, while techniques (3) and (4) do. The latter methods, however, require communication with the servers to obtain the load statistics. This requirement requires specific software to run on the servers and the front-end processor (the load balancing node). Techniques (1) and (2) usually do not work well because the statistical distributions of the workloads generated by the clients are not identical. Using these methods may cause one server to be busier than another. For example, consider the case of two clients and two servers. One client is generating a heavy work load, while the other one is generating a light one. If it so happens that the arrival pattern of requests to the front-end processor is such that the odd numbered requests are from the first client and the even numbered requests are from the second one, then it will be the case that one server will be a lot busier than the other one.
Therefore, it would be advantageous to have an improved method and apparatus for load balancing parallel servers in a distributed data processing system.
The present invention provides a method and apparatus in a distributed data processing system for handling requests. The processing of each request received at a server system is monitored, wherein the server system includes a plurality of servers. An average work load size is estimated for the plurality of servers based on previous actual work load information in response to completion of processing of a request. A most recent value of the average work load size is assigned for the plurality of servers to each request arriving at the server system. The request is forwarded to a server within the plurality of servers having a lowest estimated amount of work to process.