Load balancing is a method for distributing workloads across multiple devices. Successful load balancing optimizes resource use, for example, by maximizing throughput, minimizing response time, reducing the risk of overloading and improving reliability through redundancy.
When implementing load balancing, one important issue to consider is how to handle information that must be kept across the multiple requests in a user's session. If this information is stored locally on one backend server, then subsequent requests going to different backend servers would not be able to find it. As such, the system may send all requests in a user session consistently to the same backend server. This is known as persistence or stickiness. To provide persistence, an IP address and state of the services being load balanced needs to be known.
In the recent past, attempts have been made to load balance layer 2 devices. Providing persistence to layer 2 devices has been a challenge as layer 2 devices do not have IP addresses. Firewalls connected to an intermediary device are examples of layer 2 devices. In one approach to providing load balancing of the firewalls, the intermediary device placed the firewalls in between two load balancing units. When the intermediary receives a request from a client destined to an origin server, a first load balancing unit of the intermediary directs the request to one of the layer 2 firewall devices. The firewall device processes the request and provides the packet to a second load balancing unit of the intermediary that then directs the request to one of the servers corresponding to the request. To implement this functionality, the intermediary device utilizes two virtual machines, which utilizes significant resources and can adversely affect network manageability. In particular, the failure of any one of the virtual machines can adversely affect the entire deployment.