Local area networks of computers are an indispensable resource in many organisations. One problem in a large network of computers is that it is sometimes difficult to make efficient use of all the computers in the network. Load sharing or balancing techniques increase system throughput by attempting to keep all computers busy. This is done by off-loading processes from overloaded computers to idle ones thereby equalising load on all machines and minimising overall response time.
Load balancing methods can be classified according to the method used to achieve balancing. They can be `static` or `dynamic` and `central` or `distributed`.
In static load balancing, a fixed policy--deterministic or probabilistic--is followed independent of the current system state. Static load balancing is simple to implement and easy to analyse with queuing models. However, its potential benefit is limited since it does not take into account changes in the global system state. For example, a computer may migrate tasks to a computer which is already overloaded.
In dynamic schemes the system notes changes in its global status and decides whether to migrate tasks based on the current state of the computers. Dynamic policies are inherently more complicated than static policies since they require each computer to have knowledge of the state of other computers in the system. This information must be continuously updated.
In a central scheme one computer contains the global state information and makes all the decisions. In a distributed scheme no one computer contains global information. Each computer makes migration decisions based on the state information it has.
Load-balancing methods can be further classified as `sender-initiated` and `receiver-initiated`. In sender-initiated policies an overloaded computer seeks a lightly loaded computer. In receiver-initiated policies lightly loaded computers advertise their capability to receive tasks. It has been shown that if costs of task-transfer are comparable under both policies then sender-initiated strategies outperform receiver-initiated strategies for light to moderate system loads, while receiver-initiated schemes are preferable at high system loads.
Conventionally, a dynamic load balancing mechanism is composed of the following three phases:
1. Measuring the load of the local machine. PA1 2. Exchanging local load information with other machines in the network. PA1 3. Transferring a process to a selected machine. PA1 a) the number of links separation value in each entry of the received information is incremented by one; PA1 b) entries in the received information which originated from the receiving computer are deleted; PA1 c) entries in the information already stored in the receiving computer which were received from the sending computer are deleted; PA1 d) the received information is merged with the information already stored in the receiving computer; and PA1 e) the merged information is sorted in ascending order of load, entries with equal load being sorted in ascending order of number of links separation from the receiving computer.
Local load can be computed as a function of the number of jobs on the run queue, memory usage, paging rate, file use and I/O rate, or other resource usage. The length of the run queue is a generally accepted load metric.
The receipt of load information from other nodes gives a snapshot of the system load. Having this information, each computer executes a transfer policy to decide whether to run a task locally or to transfer it to another node. If the decision is to transfer a task then a location policy is executed to determine the node the task should be transferred to.
If a load balancing method is to be effective in networks having large numbers of computers it must be scalable, in the sense that network traffic increases linearly with the size of the network, flexible so that computers can be easily added to or removed from the network, robust in the event of failure of one or more of the computers, and must support to some degree clustering of the computers.
Centralised methods, though attractive due to their simplicity, are not fault-tolerant. The central computer can detect dead nodes and it takes one or two messages to re-establish connection. However, if the central node fails, the whole scheme falls apart. Load balancing configuration can be re-established by maintaining a prioritised list of alternative administrators in each node or by implementing an election method to elect a new centralised node. When a central node fails, other nodes switch to the new central node. This enhancement, however, increases the management complexity and cost of the method.
In addition, management overhead for a centralised method is unacceptably large. As the number of nodes increases, the time spent by the central node in handling load information increases, and there must come a point at which it will overload.
Several prior art load balancing methods are both dynamic and distributed.
For example, in the method described in Barak A. and Shiloh A. SOFTWARE--PRACTICE AND EXPERIENCE, 15(9):901-913, September 1985 [R1], each computer maintains a fixed size load vector that is periodically distributed between computers. Each computer executes the following method. First, the computer updates its own load value. Then, it chooses a computer at random and sends the first half of its load vector to the chosen computer. When receiving a load vector, each computer merges the information with its local load vector according to a predefined rule.
A problem with this method is that the size of the load vector has to be chosen carefully. Finding the optimal value for the load vector is difficult and it must be tuned to a given system. A large vector contains load information for many nodes. Thus, many nodes will know of a lightly loaded node, increasing the chance that it will quickly receive many migrating processes and will overload. On the other hand, the size of the load vector should not be so small that load values do not propagate through the network in a reasonable time. For large number n of nodes in the network, the expected time to propagate load information through the network is O(log n).
The method described in R1 can handle any number of computers after tuning the length of the load vector. Therefore, the method is scalable. However, it is flexible only up to a point. Adding a small number of nodes to the network does not require any change. Adding a large number of nodes requires changing the size of the load vector in all computers, thus increasing the administrative overhead. The method is fault-tolerant and continues to work in spite of single failures, however there is no built-in mechanism for detecting dead nodes and updating information on them in the load vectors. Thus information about failed nodes is not propagated and other nodes may continue to migrate processes to them, with the result of decreased response time.
The number of communications per unit time is O(n). However, the method does not support clustering. When a node wants to off-load process to another node it chooses a candidate from its load vector. Since the information on nodes in the load vector of node p is updated from random nodes, the set of candidates for off-loading is a random subset of n and not a controlled set for p.
In Lin F. C. H. and Keller R. M. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, 329-336, May 1986 [R2], a distributed and synchronous load balancing method using a gradient model is disclosed. Computers are logically arranged in a grid and the load on the computers is represented as a surface. Each computer interacts solely with its immediate neighbours on the grid. A computer with high load is a `hill` and an idle computer is a `valley`. A neutral computer is one which is neither overloaded nor idle. Load balancing is a form of relaxation of the surface. Tasks migrate from hills towards valleys and the surface is flattened. Each computer computes its distance to the nearest idle computer and uses that distance for task migration.
An overloaded node transfers a task to its immediate neighbour in the direction of an idle node. A task continues to move until it reaches an idle node. If a former idle node is overloaded when a task arrives, the task moves to another idle node.
This method, though scalable, is only partially flexible. It is easy to add or remove nodes at the edge of the grid, but it is difficult to do so in the middle since the grid is fixed. To do this, reconfiguration is required, which increases the overhead inherent in the method. Detection of dead nodes is not easy. Since only changes of the state of a node are sent to its neighbours, a node's failure remains undetected until a job is transferred to it. Late detection delays migrating processes and, therefore, increases overall response time. In this method clustering is not supported. Each node transfers processes to one of its immediate neighbours which may transfer it in a different direction. The overloaded node has no control as to where the off-loaded processes will eventually execute. Management overhead to start the grid is low and the number of messages is O(n). Administrative overhead for reconfiguration is high since it requires changes in all neighbouring nodes for a failed node. In this method node overhead is high and network traffic increases because tasks move in hops rather than directly to an idle node and it takes long time to migrate a process.
The buddy set algorithm proposed in Shin K. G. and Chang Y. C. IEEE TRANSACTIONS ON COMPUTERS, 38(8), August 1989 [R3], aims for very fast load sharing. For each node two lists are defined: Its buddy set, the set of its neighbours, and its preferred list, an ordered list of nodes in its buddy set to which tasks are transferred. Each node can be in one of the following states: under-loaded, medium-loaded and overloaded. Each node contains the status of all nodes in its buddy set. Whenever the status of a node changes, it broadcasts its new state to all nodes in its buddy set. The internal order of preferred lists varies per node in order to minimise the probability that two nodes in a buddy set will transfer a task to the same node. When a node is overloaded it scans its preferred list for an under-loaded node and transfers a task to that node. Overloaded nodes drop out of preferred lists. Buddy sets and their preferred lists overlap, so processes migrate around the network.
Again, this method, though scalable in that the number of messages increases linearly with the number of computers in the network n, is not flexible. Adding or removing nodes from the network requires recomputation of the preferred lists in all nodes of the related buddy set. If a node is added to more than one buddy set the recomputation must be done for each node in each buddy set. Detection of dead nodes is difficult for the same reason as in the gradient model described in R2. Clustering is supported but reconfiguratio expensive since it requires recomputation of new buddy sets and preferred lists, as for adding and removing nodes. The administrative overhead inherent in the method is therefore high.