Distributed computing systems, such as banks or networks of workstations or servers or mirrored sites on the World Wide Web, face the problem of using their resources effectively. If some hosts lie idle while others are heavily loaded, system performance can fall significantly. To prevent this, load balancing is used to distribute the workload, improving performance measures such as the expected time a task spends in the system. Although determining an effective load balancing strategy depends largely on the details of the underlying system, general models from both queueing theory and computer science often provide valuable insight and general rules of thumb.
In a system of n servers, incoming tasks must choose a server and wait for service. If an incoming task knows exactly the current number of tasks already queued at each server and can instantaneously be queued, it is usually best for the task to go to the server with the shortest queue. In many actual systems, however, it is unrealistic to assume that tasks will have access to up-to-date load information; global load information may be updated only periodically, or the time delay for a task to move to a server may be long enough that the load information is out of date by the time the task arrives. Unlike in systems in which up-to-date load information is available, the strategy of going to the shortest queue can lead to extremely bad behavior when load information is out-of-date. Systems which attempt to exploit global information to balance load too aggressively may suffer in performance, either by misusing it or by adding significant complexity. Hence, it is often not clear what the best load balancing strategy is.
If old information is utilized in selecting one of multiple servers for servicing a task, a phenomenon which can be characterized as "herding" may occur. More particularly, if multiple clients are directing tasks to particular servers based upon the stale information, numerous tasks may be directed to the server which the stale information indicates to be the least loaded but which, in fact, has a current loading which exceeds the current loading of other servers.
For example, if the available loading information is updated every T seconds, all tasks directed during the T second period will be directed to what appears to be the least loaded server. This may significantly increase the number of tasks queued for service at that server while the actual number of tasks queued for service at other servers is significantly less than that of the server to which the tasks are being directed. Hence, any delay in updating server loadings can result in herding, since tasks are directed to what appears to be the least loaded of the servers based upon stale loading information.
It will be recognized by those skilled in the art that this phenomena can occur whether the multiserver system is a homogeneous system, i.e., a system in which each of the servers is substantially similar and preferably all queues are equally loaded, or a heterogeneous system in which servers have different processor speeds, etc., and beneficially the server loadings are balanced in view of the differences in the respective server capabilities.
Various techniques have been proposed to balance the loading in multiserver environments. However, the proposed techniques have generally relied on complex algorithms and models which are difficult to implement and require substantial system overhead during operations. Accordingly, there remains a need for simple techniques for balancing the loadings of multiple processors servers in a client-server environment.