Load-balancing systems typically expose multiple direct endpoints of a service as a single virtual endpoint to the consumers of the service. The incoming traffic resulting from consumers of the service are distributed in a rotation or are “load-balanced” amongst a set of machines that are ready to provide the service at any particular moment in time. When determining when a particular machine in the set is ready to serve, the load balancer typically probes the machine over a predetermined HTTP URL and expects to see a positive response. If the machine fails to respond accordingly, it is removed out of the rotation for serving the incoming traffic and service requests will cease to be communicated thereto. If the machine begins responding to the probes, however, it is placed back into the rotation to serve the incoming traffic. The time taken for a load balancer to remove or add machines to the rotation is referred to as exclusion latency or inclusion latency, respectively.