Load balancing is a technique used in computer networks to distribute workload evenly across two or more computers, network links, central processing units (“CPUs”), hard drives, and the like. Load balancing attempts to avoid overloading a particular resource while also providing better resource utilization, resource throughput, and minimize response times. Typically, load balancing services are provided by a software program or hardware device such as a multilayer switch or a Domain Name System (“DNS”) server or the like. Load balancing is commonly used to mediate internal communications in computer clusters (high-availability clusters) or across servers in a network. For example, in a typical server farm environment, each server will report its loading to the load balancer, which will in turn consider each server's load and other parameters when assigning new traffic to a server.
FIG. 1 illustrates a block diagram of a conventional load balancing system 10. As shown, the system include a number of servers 11A, 11B and 11C, which generally can be considered nodes that host services (e.g., end-user resources such as applications, desktops, files, and the like). Moreover, the servers 11A-11C are communicatively coupled, by a network, for example, to a load balancer 12, which is a node that determines the most suitable server 11A-11C for a particular service as described above. Furthermore, the load balancer 12 is communicatively coupled to one or more client devices 13A, 13B and 13C that are also capable of connecting to the servers 11A-11C and order to access the host services. It should be appreciated that each of these nodes can be a conventional computing device, such as a computer, mobile device, virtual machine or the like.
According to the conventional system 10, server agents 14A, 14B and 14C can be provided on servers 11A-11C, respectively, and are provided as software modules, for example, on which each respective server can collect and send performance counters, such as CPU and memory usage, to the load balancer 12. Moreover, when one or more client devices 13A-13C need/request to access a client service, the particular client device sends a query to the load balancer 12. Based on the performance counters received from the servers 11A-11C, the load balancer 12 has a complete understanding of the load on all servers 11A-11C. Thus, when a client device (e.g., client 13A) needs to access a particular service (e.g., services 15A, 15B, 15C, etc.), the load balancer 12 is configured to determine the most suitable server that is hosting the requested service. In this regard, each client device (e.g., clients 13A-13C) learns what services 15A-15C are available on which servers 11A-11C by sending a query to the load balancer 12 and receiving a list of known services, i.e., a service listing. For example, in one example, a client device may receiving a simple listing of services: Service 1, Service 2, Service 3, and so forth.
The load balancing system 10 illustrated in FIG. 1 has certain technical limitations. For example, this system 10 is limited to the number of requests the load balancer 12 can accept at the same time since all connection requests received from the respective client devices 13A-13C must pass through the load balancer 12. Moreover, resources on the load balancer 12 must be consumed in order to run load balancing algorithms that compute the most suitable server (e.g., servers 11A-11C) for the desired service (e.g., services 15A-15C) that are being requested by a client device. Accordingly, this consumption of resources of the load balancer 12 limits the number of requests that can be accepted at the same time by the load balancer 12. Moreover, latency to communicate from the client devices (e.g., 13A and 13C) directly with servers (e.g., 11A and 11C, respectively) to obtain the requested service is also increased since each client device has to query the load balancer 12 before it can connect to the given server hosting the service. For example, a client device (e.g., client 13A) must wait for a response from the load balancer 12 indicating which server (e.g., “Server 11A”) is hosting the requested service and can provide the service to the client 13A.
FIG. 2 illustrates a block diagram of another conventional load balancing system 20. This configuration is similar to the design described above with respect to FIG. 1, but is designed to alleviate scalability issues by distributing load across two or more load balancers 22A and 22B. As shown, an exemplary device 23 must go through a gateway 27 in order to submit requests to one of the load balancer 22A and/or 22B. In this configuration, the gateway 27 is an intermediary node between the client 23 and the load balancers 22A and 22B, and optionally also serves as an intermediary note between the client 23 and the server 21. The gateway 27 is provided to add redundancy to the load balancers 22A and 22B as the gateway 27 is configured to automatically establish a communication connection with one load balancer 22B, for example, if another load balancer 22A, for example, stops responding. Moreover, to prevent overloading, the load balancers 22A and 22B may also be configured to coordinate between them to instruct the gateway 27 as to which load balancer it should forward queries from the client devices (e.g., client 23).
The system 20 illustrated in FIG. 2 improves load balancing performance by allowing a greater number of clients to connect and requests client services at the same time. However, the number of queries from clients that can be accepted is still limited by the resources available on the load balancers 22A and 22B. Once a suitable server (e.g., server 21) is identified, the client 23 can connect directly with the server 21 or through the gateway 27. However, it is clear that latency between the client 23 and the server 21 will inevitably be increased when compared to the system 10 of FIG. 1, since queries, and possibly server connections, must pass through the gateway 27.
Accordingly, a system and method for load balancing is needed that reduces the use of available resources on the load balancer while processing client queries.