A Content Distributed Network (CDN) enables web content from an origin server to be distributed to caching servers at various locations in a large network, such as the Internet. The CDN then enables client systems to access the web content from those caching servers. The caching servers are also called content engines, or alternatively, content servers. Content routers in the CDN route client requests to an appropriate content engine. When a client requests a Web page, for example, that is part of the CDN, the CDN typically redirects the request from the origin server to a content engine that is closest to the client. The content engine delivers the cached content to the client. The CDN also communicates with the origin server to deliver any content that has not been previously cached. This service is effective in speeding the delivery of content of Web sites with high traffic and also of Web sites that have geographically distributed customers. Additionally, CDNs can be configured to provide protection to origin servers from large surges in traffic. Distribution of content reduces the load on the origin servers where content originates. Furthermore, clients are provided with improved access to content because the clients are able to obtain content from a caching server that is closer (in terms of network distance and congestion) and less heavily loaded than the origin server.
To make content efficiently available through the CDN, a content provider defines one or more “channels”. Each channel contains a set of files that is typically expected to be accessed by the same set of users. For example, a channel may contain training videos intended to be used by sales people, or advertisements directed at an organization's customers. A subset of the content engines in the CDN is assigned to each channel. The content engines in the assigned subset are typically servers located conveniently with respect to the intended set of users of the channel.
Files in a channel are pre-positioned at the content engines assigned to that channel. Pre-positioning enables greater certainty of availability of the content to accessing client systems. The content engines assigned to the channel may be behind a slow link relative to the origin server, and the files themselves may be very large. Therefore, moving the content from the origin server to a content engine can be time-consuming. Pre-positioning the content generally reduces delay when users attempt to access files.
Pre-positioning includes an indication of what content is carried in the channel. One technique for describing current channel content is a “manifest file.” The manifest file describes the channel content through a set of rules. Rules can be as simple as providing the file name of a file in the channel. Rules can also be more complicated, e.g., indicating all files in a directory with a particular suffix.
Content engines in the channel periodically evaluate the rules in the manifest file to determine the current content of the channel. The result of evaluating the rules is a “catalog”, also referred to as a dataset, that identifies the files currently in the channel with their associated properties, e.g., an indication of where the file is coming from, or how often the file should be checked to see whether it has been modified, and therefore, needs to be re-fetched.
Each content engine maintains a replication status to monitor progress in obtaining content. The replication status typically includes data on the number of files copied to the content engine, the number of files that remain to be copied, the number of files failed, and a total number of files in the channel. The content engine, through the replication status, knows the status of the content it has cached as well as the content that remains to be acquired.
Users logged on to client systems access the content of the CDN through a portal page (also referred to simply as a “portal”) running on the browser programs at the client systems. A portal is a starting point, or a “gateway”, to the Web or to an intranet. A portal can also be used to provide a starting point for accessing the CDN. Portals generally provide consistent user interfaces for network access. The portal typically includes services such as a search engine and directories.