A Content Distribution Network (CDN) enables web content from an origin server to be distributed to caching servers at various locations in the Internet. The CDN then enables the web content to be accessed from those caching servers. The caching servers are also called content engines, or alternatively, content servers. Content routers in the CDN route user requests to an appropriate caching server. When a user requests a Web page, for example, that is part of the CDN, the CDN typically redirects the request from the origin server to a caching server that is closest to the user. The caching server delivers the cached content to the user. The CDN also communicates with the origin server to deliver any content that has not been previously cached. This service is effective in speeding the delivery of content of Web sites with high traffic and also of Web sites that have geographically distributed customers. Additionally, CDNs can be configured to provide protection to origin servers from large surges in traffic. Distribution of content reduces the load on the origin servers where content originates. Furthermore, users are provided with improved access to content because the users are able to obtain content from a caching server that is closer (in terms of network distance and congestion) and less heavily loaded than the origin server.
To make content available through the CDN, a content provider defines one or more “channels”. Each channel contains a set of files that is typically expected to be accessed by the same set of users. For example, a channel may contain training videos intended to be used by sales people, or advertisements directed at an organization's customers. A subset of the content servers in the CDN is assigned to each channel. The content servers in the assigned subset are typically servers located conveniently with respect to the intended set of users of the channel.
Files in a channel are pre-positioned at the content servers assigned to that channel. Pre-positioning enables greater certainty of availability of the content to accessing users. The content servers assigned to the channel may be behind a slow link relative to the origin server, and the files themselves may be very large. Therefore, moving the content from the origin server to a content server assigned to the channel can be time-consuming. Pre-positioning the content generally reduces delay when users attempt to access files.
Pre-positioning includes an indication of what content is carried in the channel. One technique for describing current channel content is a “manifest file.” The manifest file describes the content through a set of rules. Rules can be as simple as providing the file name of a file in the channel. Rules can also be more complicated, e.g., indicating all files in a directory with a particular suffix.
Servers in the channel periodically evaluate the rules in the manifest file to determine the current content of the channel. The result of evaluating the rules is a “catalog”, also referred to as a dataset, that identifies the files currently in the channel with their associated properties, e.g., an indication of where the file is coming from, or how often the file should be checked to see whether it has been modified, and therefore, needs to be re-fetched.
The set of caching servers providing content in a channel can be very large. It is not desirable for the caching servers to evaluate the rules in the manifest file individually, since this can lead to too much communication and load at the site that stores the manifest file. Optimally, only one of the caching servers in the subset assigned to a channel carries out this computation. That one caching server, generally referred to as the master server, then propagates the results to the other caching servers. The master server preferably does not propagate changes by re-sending the entire catalog that results from the manifest file evaluation. The catalog can be extremely large, and it is likely that the catalog will have changed very little since the last time the rules were evaluated. Instead, the master server preferably computes a set of changes and propagates the set of changes. This way, the communication within the system needed to keep catalogs up to date at all the caching servers assigned to the channel is kept to a minimum.