In a client-server computer system, clients rely on servers to provide needed services. In the simplest form of these systems, a single server serves multiple clients. If this is the case, then any degradation in the quality of service (QoS) provided by the server, or failure of the server, will result in poor or failed service at each of its clients.
In many cases, however, this single point of failure is unacceptable. Therefore, systems are often built such that multiple servers are available to service clients, and clients are able to change (“failover”) from one server to another. For example, if a client detects that a server fails to respond, then the client can failover to another server providing the same service.
One approach for detecting the need for failover is to use a timeout mechanism configured on the client. In this timeout approach, given a particular request, the client will wait time T for a response from the server and will retry the request R times, again waiting time T for each retry. In a situation where the server cannot respond in time T to the request, either because the server is down (has failed), or the QoS has degraded, then the client waits for a total time of R*T without a response to the request and then fails over to another server.
A problem with the timeout approach is that the client wastes the total time to failover of R*T. Another problem with the timeout approach is that failover time is constant for a particular client. In many cases, a server's speed of response is dictated by the server's operating conditions, including network conditions. In the timeout approach, the client's timeout value does not adapt and therefore the client's QoS suffers under changing conditions.
A second problem with the timeout approach is that it increases network traffic. Depending on implementation, O(R) messages per client will be passed when failover is needed.
Once a server has “timed out” a predefined number of times for a particular client, the client fails over to a second server. This second server is typically chosen from a preconfigured list of alternative servers on the client. A problem with this configured failover approach is that the choice of server to which to failover is based on a fixed list and not on network conditions or the operating conditions of the original server or the servers to which the client could failover.
Another approach is to use a load balancer to handle failover. A load balancer routes messages between clients and servers, acting as a single point of contact for multiple clients and allowing those clients to be served by multiple servers. In many cases, a client must be served by the same server for all related messages. In such cases, the load balancer must make client-server relationships “sticky” even when using a stateless protocol such as hypertext transfer protocol (HTTP) that does not inherently support maintaining long-duration connections of clients to servers. A load balancer makes a client-server session sticky by either keeping state for each client session, thereby keeping track of the routing of messages between clients and servers, or otherwise determining, for each message for each client-server relationship, to which client-server relationship that message corresponds.
A problem with the load balancer approach is that the implementations of stickiness algorithms are computationally expensive, memory intensive, and difficult to deploy. A related problem with the load balancer approach is that it requires at least one separate process, the load balancer. If a client could failover correctly on its own, then there would be no need for a load balancer and load balanced client-server systems as a whole could be simpler.
Another problem with the load balancer approach is that determining the server to which to failover is based on a preconfigured list on the load balancer and not on network conditions or the operating conditions of the original server or the servers to which the client could failover.
From the above Background and in the upcoming description it will be clear that there is a need for a system for adaptive load balancing that overcomes the problems of clients failing over to alternative servers without considering the first servers operating conditions, including network conditions, or the other server's operating conditions; and needing a separate process for load balancing.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.