The present invention relates to a distributed computer server system comprising a load balancer and a plurality of host processors and more particularly to a method and apparatus for dividing requests for service among the various host processors comprising the distributed server.
Many applications exist in which requests for service are received at a computer system and the computer system needs to service the requests in a prompt manner. Such is particularly true of interactive systems such as web servers in which a user initiates a hypertext transport protocol (http) request for service and awaits the receipt of a response, typically in the form of a downloaded web page. If the time for service is too long, the users may become frustrated and disinclined to use the web site. The same holds true more generally for any application where a computer system processes user requests. For example, the computer system might be a database server which processes database queries for users, or, more generally, the computer system might simply be a powerful computer to which users send their programs for execution.
When the demand for service exceeds the ability of a host processor to service the demand, the site administrator is faced with several options. The administrator can replace the current host processor with a faster host processor or add additional host processors to create a distributed server. Given the availability of high performance low cost processors, the use of multiple processors is an option typically chosen to satisfy increasing demands for service due to the scalability of this approach.
Once a decision has been made to utilize multiple host processors in a distributed server arrangement, policy decisions must be made as to how to assign the requests for service to the respective host processors. Such policy decisions are often referred to as the task assignment policy. The assignment of requests to the respective host processors in a distributed server system is performed by a processor known as a load balancer which receives a demand for service and based on some predetermined policy, forwards the request to one of the host processors within the distributed server system.
The load balancers known in the art typically distribute requests for service in a round robin basis or forward the request for service to the host processor with the shortest queue length. Some load balancers employ dynamic proprietary assignment techniques which are understood to forward requests for service based upon feedback obtained from the respective hosts.
The effectiveness of a task assignment policy is measured in terms of performance metrics like mean waiting time (the average waiting time over all tasks). Most task assignment policies perform well when the variability among the task sizes being serviced is not too great. When there is a large variability among the task sizes being serviced, most task assignment policies experience a severe decline in performance.
Studies of requests for service from web servers have shown that such are generally in the nature of a heavy tailed distribution in which there are a far greater number of small jobs or small requests than large jobs or large requests. A heavy tailed distribution is one whose tail declines like a power law, that is, P[X&gt;x] .about.x.sup.-.alpha. for 0&lt;.alpha..sub.--.sup.23 2. Such a task size distribution shows extremely high variability among the task sizes between the small jobs and the large jobs. Studies of requests for service in other application areas have also shown a heavy-tailed distribution (with extremely high variability). Examples include, for example, the processing time for Unix tasks.
Another policy decision, which is related to the task assignment policy, is the queueing policy at the host processors. Once the load balancer assigns tasks to a host for processing (via the task assignment policy), that host then processes (services) those tasks in some order.
In the prior art, all the host processors use the same service order. Typically this service order is one of the following three: (1) First-Come-First-Served (FCFS), where jobs are serviced in the order in which they are queued at the host (i.e. in the order of their arrival to the host). Once a task begins receiving service, its service is not interrupted until the task completes. (2) Time-Sharing (also known as Processor-Sharing (PS)), where each job at the host is serviced for a small predetermined period of time and then the processor goes through a context switch and another job is serviced for a small period of time. That is, the host processes a job in a time multiplexed basis with other pending requests. (3) Admission Control, where the host only time-shares among the first m tasks at the head of the queue, where m is the optimal multiprogramming level for the host. The rest of the tasks are simply queued in the order in which they arrive. When one of these m tasks complete, the very next task in the queue is chosen to replace it. The advantage of FCFS service order over time-sharing is its simplicity of implementation. In many applications the context switch cost can be prohibitive, or simply impractical, and thus FCFS service order is used. PS service order, on the other hand, has the advantage that small jobs receive service quickly, even if they arrive subsequent to a large job seeking service. Admission Control is more similar to FCFS, since it predominantly is a queueing service order. Host processors are referred to herein as being run in FCFS mode, Admission Control mode, or time-sharing mode, when referring to the service order of jobs at the host.
It would be desirable to provide a load balancing technique which efficiently services requests notwithstanding high variability in task sizes.