A caching server or a proxy server distributes content on behalf of different content providers to different end users. The caching server receives an initial request for particular content provider content from an end user. This initial request will typically result in a cache miss as the caching server will not have a locally cached copy of the requested content. Accordingly, the caching server retrieves a copy of the requested content from the particular content provider's origin server, wherein the origin server is the originating distribution point for content of the particular content provider. The caching server caches the retrieved copy in local memory or storage and distributes the retrieved content in response to the initial request. The caching server also distributes the locally cached copy of the content in response to subsequent end user requests for the same content. Thus, once a copy of content provider content is cached at the caching server, the caching server can serve that same copy to different end users without a subsequent access to the content provider origin server. A time-to-live parameter or other validation techniques can be used to determine when the caching server is to refresh the copy of the cached content from the origin server.
One or more caching servers can shield and reduce the load on an origin server by absorbing large numbers of end user requests and by fanning out cached copies of the origin server content in response to those requests. In other words, the caching servers expose the origin server to a mere fraction of the end user requests, which in turn substantially reduces load on the origin server.
Yet, there are certain circumstances in which the caching servers can impose onerous loads on the origin server. Such circumstances arise during an initial flood of requests for newly available content or newly updated content. For instance, when a popular download or streaming event (e.g., movie, episode from a show, sporting event, concert, or other live event) becomes available, a large number of users contemporaneously submit hundreds or thousands of requests for the same content to the same caching server.
In such circumstances, the caching server receives a first request for content that triggers the caching server to retrieve the content from an origin server. In the time it takes the caching server to request and retrieve the content from the origin server, the caching server may receive hundreds more requests for the same content. This is especially problematic when the content being retrieved is of a large size (e.g., several hundred megabytes or gigabytes). Since the caching server does not have the requested content cached at the time the subsequent requests arrive, and the caching server has no knowledge for when the content might be cached, each subsequent request results in a cache miss with the caching server forwarding additional requests for the same content to the origin server. The origin server will then send multiple copies of the same content to the same caching server in response to each of the requests. The initial flood of requests can effectively remove the request absorbing shield provided by the caching servers. Consequently, the origin server can quickly be overrun by the requests from the caching servers and become unresponsive as a result. An unresponsive origin server would cause the requested content to be unavailable to the caching servers and the end users.
Accordingly, there is a need to preserve the request shield provided by the caching servers for an origin server during an initial flood of content requests. There is a need to eliminate a caching server from submitting duplicative requests for the same content to the origin server even when the requests originate from different end users and the multiple requests for the same content arrive before the caching server retrieves a cached copy of the requested content. There is therefore a need to modify the cache miss operation of a caching server and prevent the caching server from issuing duplicative requests to an origin server when a cache fill operation for the same content is already in progress.