Distributed data processing systems are becoming widely used for complex processing tasks. By distributing processing between a number of processors such systems are capable of performing complex tasks rapidly. However, to optimise their performance software that is allocated to a distributed system must be split efficiently between the available processors. Some software is specially written for multi-processor systems and includes instructions indicating which parts of the software are to be performed in parallel by which processors. However, this does not take account of other tasks that the processors may have to interleave with the software, and it cannot account for configurations of processors that were unknown to the software designer or that have arisen because of partial failure or upgrading of a standard multi-processor system. There is therefore a need for an effective and more generic system of load balancing.
A sophisticated multi-processor data processing system may be considered as cluster of processing nodes (CPUs) and a load balancer function. The load balancer function allocates tasks to the processors according to pre-defined rules. When software for providing a certain service is to be run by the cluster, the processes involved in the software may be divided so that a number of processing nodes are participating in the providing of the service in a load sharing fashion. Those processing nodes are termed a load sharing group. The nodes are not restricted to participating in the providing of only one service; instead multiple software functions can be allocated to a node. In addition a node will always be spending some time executing software related to the maintenance of the cluster and the node itself (i.e. the platform). Therefore the processing node requires some processing capacity just to perform its normal maintenance duties.
For each service allocated to a node there will typically be a number of processing entities (processes) executing, each of which provides some part of the service. In some cases there will even be multiple instances of the same process executing to increase parallelism and fault isolation.
The overall performance of the cluster is very dependant on the principle used by the load balancer to allocate load to the available nodes. If a node is overloaded then the results of its processing is likely to be delayed. This can be especially serious when those results are to be used by another processor because serial delays can then build up.