As is known in the art, a computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Clusters with hundreds of computers (or “nodes”) may be used to perform complex distributed processing tasks such as deep neural network (DNN) machine learning. Clusters may be deployed to improve performance and availability while typically being much more cost-effective than single computers of comparable speed or availability. Cloud-based computing environments make it possible to allocate large clusters programmatically using Application Programming Interfaces (APIs) through which an administrator can instantiate and configure virtual machines (or “instances”) as desired or necessary.
As is also known in the art, cloud-based clusters and other large server deployments may utilize load balancers to distribute network traffic across physical and/or virtual servers. A load balancer may be provided as a software program that listens on a network port where external clients connect. The load balancer may forward client requests to one of the “backend” servers, which processes the request and send a response back to the load balancer. Some load balancers may include routing capabilities. For example, existing load balancers may be configured to route certain types of requests to specific backend servers.
Traditionally, load balancers have had to maintain state information about the backend servers. For example, some existing load balancers maintain a lookup-table or prioritized list of backend servers. Processing a single request may involve iterating through long lists of rules in order to determine where to route the request. Moreover, before client requests can be routed to a particular backend server, the server must be registered with the load balancer. In cloud-based systems where the allocation of backend servers can change frequently, the load balancer must be updated often and can require complex rules to ensure proper routing of traffic. These problems are compounded when multiple load balancers are employed for redundancy or scalability.