Advancements in computing and network technologies now allow users to access different types of online content and services from almost any geographic location through a web browser or other client application installed at their respective computing devices. For example, a web service may be provided to user devices over the Internet by multiple computing devices operating within a data center or distributed computing system. Such computing devices may include, but are not limited to, servers, storage devices, routers, gateways, and other types of networked computing devices, which may be distributed across a local or wide area network associated with a particular service provider.
A distributed computing system often encounters performance bottlenecks and scalability problems that may prevent the system from effectively controlling resource usage by managing workloads distributed across multiple computing devices within the system. A distributed computing system may employ a hardware or software load balancer to monitor system resources and manage workloads distributed across the various computing devices within the system. For example, such a load balancer may be used to receive incoming requests from different clients or end users of a web service and distribute multiple data packets related to the received requests for processing by different back-end servers within the system. Further, the load balancer may be used to determine the load state of each remote back-end server device used to process data packets based on measurements of resource usage at that particular server. Such measurements may include, but are not limited to, central processing unit (CPU) usage, memory usage, bandwidth, input/output (I/O) or other back-end resource limits. Starvation of any one or combination of these resources may cause a server within the system to reach an “unhealthy” load state, in which the server is unable to effectively handle its current workload due to one or more resource constraints. Thus, the load balancer may use various load balancing techniques to identify the resource-constrained server within the system and transfer at least a part of the workload from that server to a different server that has sufficient resources to handle the transferred workload.
However, conventional load balancing techniques may lead to inaccurate measurements and scalability problems, particularly when multiple load balancers are employed within the system. Such conventional techniques generally use either in-band or out-of-band health checks of specific devices within the system to identify potential resource constraints. In-band health checks generally involve checking the load state or health of each device in the system on a periodic basis. To prevent false positives, multiple health checks, seconds apart, must fail before the load state of a device within the system is determined to be unhealthy. However, this may prevent the load balancer from identifying certain devices that have an actual resource constraint, particularly those having resources that are constrained by only a marginal amount. Since such marginally constrained devices may alternately pass and fail successive health checks, the failed health checks may not be consistent enough to indicate a problem to the load balancer. Moreover, when an actual problem does exist, significant system delays induced by the need to fail multiple checks may cause connection problems and a large number of data packets to be dropped.
Out-of-band checks generally involve polling each device at some predetermined time interval. Consequently, such out-of-band polling techniques typically provide only a snapshot of a device's load state or an average of the resource usage at the device between polling intervals. However, the data produced by such out-of-band checks may not reveal when the device's resource usage is in an overloaded or idle state of operation, since significant spikes or sags in resource usage may be missed or excluded as the resulting data is averaged out over time. Further, such out-of-band checks may require additional overhead in the form of custom thresholds and measurements in order to correctly identify and measure resource constraints for each type of device and traffic pattern.
The problems associated with the aforementioned in-band and out-of-band load balancing techniques may be exacerbated by the use of multiple load balancers that share the same set of back-end servers. Each load balancer in such a system may be aware of only the amount of data traffic it sends to a given back-end server, without having any knowledge of the amount of data being sent by other load balancers within the system. This may result in a “bell ringing” effect, in which traffic from multiple load balancers converges on the same back-end server, thereby overwhelming the server's resources and causing it to fail the health checks performed by each load balancer. The load balancers may then shift traffic away from the failed server and overwhelm another back-end server. These traffic shifts may continue until a sufficient number of back-end servers are available or the system's workload drops to a level that is sufficient to alleviate resource constraints and prevent the back-end servers from becoming overloaded any further.