In a large scale data communications network with millions of users, controlling the load on origin servers and network infrastructure is crucial. As the number of clients in a network increases, a server will receive an increasing number of requests to process. The number of requests may exceed the capacity of the server or of network infrastructure such as switches, gateways, and proxies, particularly during peak traffic or a maintenance period. Consequently, there is a need for a mechanism to gracefully reduce the number of incoming requests to a network server.
A basic solution currently employed is to increase the capacity of servers or infrastructure before an overload situation occurs. However, this approach is impractical in most cases, as it is usually hard to accurately estimate the growth of the number of clients or to predict peak levels. Further, this approach may be prohibitively expensive, as it requires design for the maximum possible traffic, which may be substantially greater than the normal traffic.
A better solution is to employ some form of flow control mechanism. In current technology, such mechanisms are defined between an origin server and individual clients, i.e., end-to-end. For example, an origin server may refuse a connection to a particular client, or it may return an incoming request to the issuing client, asking it to wait for a certain time before issuing any more requests. Similarly, infrastructure such as proxies may refuse connections or use connection end-to-end flow control. An example of current technology is the hypertext transport protocol (HTTP) flow control mechanism, allowing origin servers to return an error response to a client, indicating that the service is temporarily unavailable and that requests should be suspended for a certain period of time. Another example is the flow control mechanism provided by transport control protocol (TCP), which allows flow control end-to-end on individual TCP connections. This type of solution is better than relying on capacity planning alone, but it can only be used once a client has actually made a request. It is not possible with these schemes to take preventative measures, holding off requests from other clients before they have actually reached the origin server or some congested infrastructure element. In a large network with millions of users, this can be a serious problem.