Some types of networks, for example, the Internet, make use of origin servers to service a number of requests, typically client or user requests generated by client or user devices. Each such request is a request for an object stored in an origin server. Requested objects typically include hypermedia information such as text, graphics, video, sound files, etc. As the number of requests made to an origin server at any given instant may be quite large there is a need to control the number of requests to prevent origin server overload. One approach for controlling the number of requests made to an origin server is to proxy the requests for a single object through a single connection between a cache server and the origin server. With this approach, the cache server maintains a selection of objects from the origin server and if possible attempts to serve object requests from the selection. If a requested object is not stored in the selection, then the cache server uses the single connection with the origin server to retrieve the object while queuing all subsequent requests for the object. When the object has been received, all queued requests for the requested object are then served. One disadvantage of this approach is that it is error prone since servicing the queued requests depend on the proper operation of a single connection. Further, since many requests appear to be cacheable until a response from the origin server indicates otherwise, this approach also causes non-cacheable requests for objects to queued even though there is no benefit in doing so.
An advancement over the above approach is to allow all requests that are not serviceable by the cache server to reach the origin server in parallel. This approach suffers from a drawback that it can lead to very high load at the origin server resulting in failure and service disruptions.
Accordingly, there is a need to handle requests to an origin server in a manner which prevents overload of the origin server but which is also reliable.