1. Field of Invention
The present invention relates to load balancing requests among a plurality of web servers.
2. Description of Related Art
Load balancing is a process of distributing a workload among a plurality of resources. The goals of load balancing may include improving resource utilization, maximizing throughput, minimizing response time, and avoiding overload. In addition, the ability to load balance among multiple machines may increase reliability through redundancy. Load balancing is commonly used to distribute tasks among a pool of web servers according to various scheduling algorithms. An apparatus that performs the load balancing according to a scheduling algorithm is referred to as a “load balancer.”
One scheduling algorithm used by a load balancer for assigning work among a pool of web servers is round-robin scheduling. In round-robin scheduling, tasks are distributed in equal shares to each web server in circular order. Although round-robin scheduling equalizes the number of requests sent to each web server, the work to be done and the time needed to respond to those requests vary (i.e., the processing costs of responding vary). Thus, although the number of provided requests is equalized among the web servers, the costs are not equally distributed, and it may take some web servers longer to process their requests than other web servers. As a result, even though each of the web servers receives the same number of requests, the work queue for some web servers may grow long whereas other web servers may have few or no requests in their respective queues. Because response time is proportional to the number of requests in a queue, the average response time suffers when the number of queued requests becomes unequally distributed among web servers.
Typically, if a load balancer attempts to send an additional task to a server that is overloaded with tasks (i.e, the queue is full), the data packet representing the task is dropped. The load balancer has to wait for the expiration of a time period (i.e., a “time out”) without receiving a response in order for the load balancer to conclude that the data packet representing the task should be sent to another server for processing. In an attempt to avoid these inefficiencies and to ensure all data packets are handled, many have suggested increasing the queue depth of the servers, for example from 32 to 64, and beyond. Alternatively, queues are configured so that requests are accepted and queued by overloaded servers. Although this avoids the wait and retry cycle described above, the inefficiencies implicit in the long times needed to process the requests in the queue of the overloaded servers remain.