Global communications networks such as the Internet are now ubiquitous with an increasingly larger number of private and corporate users dependent on such networks for communications and data transfer operations. As communications security improves, more data can be expected to traverse the global communications data backbone between sources and destinations (typically, server hosts) placing increasing demands on those entities that handle and store data. Such increased demands are typically addressed at the destination by adding more switching devices and servers to handle the load. However, this can be an expensive proposition in terms of hardware, software, setup, and administration.
Network load-balancers provide client access to services hosted by a collection of servers (herein known as “hosts”). Clients connect to a load-balancer, which transparently (to the clients) forwards them to a host according to a set of rules. This general load balancing context includes the following: packets form sequences, called sessions; sessions should be allocated among the available hosts in a “balanced” manner; and, every packet of each session should always be directed to the same host, so long as the host is alive (the latter is known as “session affinity”).
This problem is most often solved through the use of a single monolithic load-balancer that monitors the status (liveness/load) of the hosts and maintains state in the form of a table of all active sessions. When a new session arrives, the load-balancer selects the least-loaded host that is available and assigns the session to that host. In order to provide session affinity, the load-balancer must “remember” this assignment (routing) decision by adding an entry to its session table. When subsequent packets for this session arrive at the load-balancer, a single table lookup determines the correct host. However, an individual load-balancer can be both a single point of failure and a bottleneck; the size of its session table (and thereby the amount of state maintained) grows with increased throughput and routing decisions for existing session traffic require a state lookup (one per packet). Circumventing these limitations requires multiple load-balancers working in tandem (scale-out), and/or larger, more powerful load-balancers (scale-up). However, scaling-out these load balancing devices is complicated, due most notably to the need of maintaining consistent state among the load-balancers. Likewise, scaling them up is expensive, since cost versus throughput in fixed hardware is non-linear (e.g., a load-balancer capable of twice the throughput costs significantly more than twice the price).