Servers are arguably the most valuable resource in a data center but due to complexities of connectivity and overbooking in data center networks, servers are often either significantly overused or underused. Effective utilization of servers is a dynamic resource allocation problem which could be remedied with a good load balancer. Conventional wisdom has it that for a load balancer to operate effectively, it requires significant state information from servers and/or virtual machines (VMs). This issue is considered particularly acute if the proposed load balancer is meant to have global oversight over all the servers and VMs in a data center. Additionally, many tasks in the data center involve a sequence of packets (a flow) that are preferably assigned to the same server, thus making the resource allocation problem complex at larger scales. Thus, scalability while preserving flow integrity is a challenge. The assignment of flows to servers and VMs is quite often done poorly, even in small data centers such as the proposed VINE (Virtually Integrated Network Edge)-Scale data centers, resulting in poor utilization of servers.
The reason for the above is that load balancers (LBs) are typically localized to one rack or at best a small cluster of racks in a data center. This limitation comes from the need to avoid large amounts of state information to keep track of jobs in a data center. State-of-art LBs that handle self-contained jobs—meaning the job does not consist of a sequence of packets with the same 5-tuple—typically work with round robin scheduling or a more efficient randomized round robin version implemented via ‘hash functions’. When balancing load for flow-based jobs, randomized round robin typically works as follows: All the relevant information for a flow (i.e.: the 5-tuple consisting of source IP address, destination IP address, protocol, source port, and destination port) is read from the packet header and then uniquely mapped via a hash function to a VM address. The hash function ensures that for the set of all VMs, each receives on average an equal fraction of the total number of flows entering the data center and that this mapping is performed efficiently without exchange of state from the servers or VMs to the load balancer. Such a load balancing mechanism for flow-based load balancing can work relatively well if all VMs are homogeneous in performance and flow processing requirements have small variations. In reality, neither of these conditions is generally true, due to a multiplicity of factors but mostly because servers/VMs perform significantly differently from each other even when they are initially equally sized; and flow processing requirements typically have heavy-tailed characteristics with large variability. These two factors can result in ineffective and often massive under/over-utilization of servers/VMs in a data center with traditional and even state-of-art load balancers. In the networking and queuing literature, alternative dynamic load balancers have been proposed that can alleviate these issues to some extent. However, these load balancers typically lack a notion of flow awareness for packetized traffic and incur communication overhead in obtaining state information that is prohibitive in practice, resulting in poor scalability. An example of such schemes is the family of randomized round robin schedulers (sometimes referred to as ‘power of k’ schedulers), where the state of k randomly selected servers/VMs is checked and the allocation is made to the least-loaded server/VM. Further examples are schemes in which the servers request jobs from a centralized queue when they are idle.
Therefore, improvements to load balancers in general and centralized load balancers in particular, would be highly desirable.