A load balancing system, or load balancer, is a protocol, implemented on a piece of network hardware, such as a network switch, and adapted to direct network traffic to different clusters of hardware so as to distribute the workload such that no hardware is overloaded with tasks, thereby preventing hardware failure due to prolonged over-working.
A load balancing system may be implemented in an information retrieval system, such as a search engine, whereby the network hardware serves to reply to search queries. In such cases, the load balancing metric may be based on queries per second (QPS), such that a static limit to the number of acceptable QPS is set, using empirical data. Then, when the QPS limit is reached for a given cluster, traffic is diverted to another. The problem with this method is that it does not take account of how expensive a given query is. For example, a given query to the system may yield a certain number of results, but the number of results may increase dramatically over time, and the processing work required to respond with all appropriate results will also increase. The QPS limit is no longer useful, since the work done by the hardware to fulfill that same number of queries has increased to an unacceptably high level.
An alternative to QPS-based load balancing is to use a cost-based method, where the metric of interest is the number of items processed by a given CPU in a cluster. It is assumed that this number of items processed is equivalent to the work being done by a given processor, and while this has been shown to increase the stability of the load balancing beyond the QPS-based methodology, the correlation between the number of items processed by a CPU, and the work done by the CPU is not always predictable. This de-correlation essentially means that a given process may change and use more of the CPU capacity, or vice versa, such that the capacity is unpredictable.
A further alternative load balancing method is to use actual CPU usage as the metric upon which the balancer runs. While this method increases the precision of the load balancer beyond that of the two methods previously described, it has been found to induce issues of unpredictability. If a given process within the information retrieval system is subsequently optimized to use less CPU capacity, this means that the capacity of the cluster increases. In response to this increase in capacity, the number of queries directed to the cluster increases, and this has been shown to result in another part of the cluster being overloaded. Essentially this means that each time an aspect of the system is optimized, load balancing using CPU usage as a metric has to be recalibrated.
As such, there is a need for a more efficient method of balancing the work load to clusters of network hardware.