In computer networking, load balancing is a technique to spread work between two or more computers, network links, CPUs, hard drives, or other resources, in order to achieve efficient resource utilization, high throughput, and low response time. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy. A dedicated program or hardware device (such as a multilayer switch) usually provides the balancing service.
One of the most common applications of load balancing is to provide a single Internet service from multiple servers, sometimes known as a server farm. Commonly load-balanced systems include popular web sites, large Internet Relay Chat networks, high-bandwidth File Transfer Protocol sites, NNTP servers, and DNS servers. For Internet services, the load balancer is usually a software program that is listening on the port where external clients connect to access services. The load balancer forwards requests to one of the “backend” servers, which usually replies to the load balancer. This approach allows the load balancer to reply to the client without the client ever knowing about the internal separation of functions. This approach also prevents clients from contacting backend servers directly, which may improve security by hiding the structure of the internal network and preventing attacks on the kernel's network stack or unrelated services running on other ports.
Load balancing is often used to implement failover—the continuation of a service after the failure of one or more of its components. The components are monitored continually (e.g., web servers may be monitored by fetching known pages), and when one becomes non-responsive, the load balancer is informed and no longer sends traffic to it. When a component comes back on line, the load balancer begins to route traffic to it again. For this reason, in order to ensure the continuation of the service, such environment is built with additional capacity to account for failure scenarios. This is much less expensive and more flexible than failover approaches where an administrator pairs a single “live” component with a single “backup” component that takes over in the event of a failure. In particular, rather than doubling the number of servers used, the administrator can include a certain percentage of redundancy less than one-to-one that is still adequate to handle common failures.
Load balancers use a variety of scheduling algorithms to determine which backend server to send a request. Simple algorithms include random choice or round robin. More sophisticated load balancers may take into account additional factors, such as a server's reported load, recent response times, up/down status (determined by a monitoring poll of some kind), number of active connections, geographic location, capabilities, or how much traffic the load balancer has recently assigned the server. High-performance systems may use multiple layers of load balancing.
Each of these load-balancing techniques considers the past or current health or status of the destination servers to determine where to route client requests. While this works well in some situations, unexpected loads may make the load balancing decision a poor one in light of more complete knowledge about the situation. For example, a server that has been idle for a while may seem like a good target to which to send future requests. However, that server may also decide to perform clean up tasks, such as disk defragmentation, garbage collection (e.g., clean up of runtime objects), server backup, and so forth, based on the previous idle state. In some cases, decisions based on past and current information may be too reactive, and in some cases too late based on conditions by the time the load balancer has routed requests to a particular destination server.