1. Technical Field
The present invention relates generally to content delivery in distributed networks.
2. Brief Description of the Related Art
A company's Web site represents its public face. It is often the initial point of contact for obtaining access to the company's information or doing business with the company. Public facing Web sites are used for many purposes. They can be used to transact commerce, where end consumers evaluate and buy products and services, and they are often linked to revenue generation and satisfying customer requests. They can be used as news and information portals for supplying the latest content for consumers. A company's Web site can be used as a customer self-service venue, where customer satisfaction is critical to loyalty in getting customers to return to the Web site. These are merely representative examples, of course. As companies place greater importance on the Internet, Web sites increasingly become a key component of a company's business and its external communications. As such, the capability and flexibility of the supporting Internet infrastructure for the Web site becomes mission-critical. In particular, the infrastructure must provide good performance for all end user consumers, regardless of their location. The site must scale to handle high traffic load during peak usage periods. It must remain available 24×7, regardless of conditions on the Internet. When performance, reliability, or scalability problems do occur, Web site adoption and usage can be negatively impacted, resulting in greater costs, decreased revenue, and customer satisfaction issues.
It is known in the prior art to off-load Web site content for delivery by a third party distributed computer system. One such distributed computer system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider typically provides the service on behalf of third parties. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure. Typically, “content delivery” means the storage, caching, or transmission of content, streaming media and applications on behalf of content providers, including ancillary technologies used therewith including, without limitation, DNS request handling, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence. The term “outsourced site infrastructure” means the distributed systems and associated technologies that enable an entity to operate and/or manage a third party's Web site infrastructure, in whole or in part, on the third party's behalf.
FIGS. 1-2 illustrate a known CDN infrastructure for managing content delivery on behalf of participating content providers. In this example, computer system 100 is configured as a CDN and is managed by a service provider. The CDN is assumed to have a set of machines 102a-n distributed around the Internet, and some or even all of these machines may be located in data centers owned or operated by third parties. Typically, most of the machines are servers located near the edge of the Internet, i.e., at or adjacent end user access networks. A Network Operations Command Center (NOCC) 104 may be used to administer and manage operations of the various machines in the system. Third party content sites, such as Web site 106, offload delivery of content (e.g., HTML, embedded page objects, streaming media, software downloads, and the like) to the distributed computer system 100 and, in particular, to “edge” servers. Typically, this service is provided for a fee. In one common scenario, CDN content provider customers offload their content delivery by aliasing (e.g., by a DNS canonical name) given content provider domains or sub-domains to domains that are managed by the service provider's authoritative domain name service. End users that desire such content may be directed to the distributed computer system to obtain that content more reliably and efficiently.
The distributed computer system typically also includes other infrastructure, such as a distributed data collection system 108 that collects usage and other data from the edge servers, aggregates that data across a region or set of regions, and passes that data to other back-end systems 110, 112, 114 and 116 to facilitate monitoring, logging, alerts, billing, management and other operational and administrative functions. Distributed network agents 118 monitor the network as well as the server loads and provide network, traffic and load data to a DNS query handling mechanism 115, which is authoritative for content domains being managed by the CDN. A distributed data transport mechanism 120 may be used to distribute control information (e.g., metadata to manage content, to facilitate load balancing, and the like) to the edge servers. As illustrated in FIG. 2, a given machine 200 comprises commodity hardware (e.g., an Intel Pentium processor) 202 running an operating system kernel (such as Linux or variant) 204 that supports one or more applications 206a-n. To facilitate content delivery services, for example, given machines typically run a set of applications, such as an HTTP Web proxy 207, a name server 208, a local monitoring process 210, a distributed data collection process 212, and the like. For streaming media, the machine typically includes one or more media servers, such as a Windows Media Server (WMS) or Flash 2.0 server, as required by the supported media formats.
The CDN may be configured to provide certain advanced content delivery functionality, for example, in the case where the edge server does not have the requested content (e.g., the content is not present, the content is present but is stale, the content is “dynamic” and must be created on the origin server, and the like). In such circumstances, the edge server must “go forward” to obtain the requested content. An enhanced CDN often provides the capability to facilitate this “go forward” process. Thus, it is known to provide a “tiered distribution” by which additional edge servers in the CDN provide a buffer mechanism to the Web site origin server. In a tiered distribution scheme, a subset of the edge servers in the CDN is organized as a cache hierarchy, so that a given edge server in an edge region has an associated “parent” region that may store an authoritative copy of certain requested content. A cache hierarchy of this type is then controlled at a fine-grain level using edge server and parent server configuration rules that are provided through the distributed data transport mechanism. U.S. Pat. No. 7,133,905, which is assigned to the assignee of the present application, describes this scheme. Another advanced function that may be implemented is quite useful when an edge server has to go forward to an origin server for dynamic or non-cacheable content. According to this technique, the CDN is configured so that a given edge server has the option of going forward (to the origin) using intermediate CDN edge nodes instead of relying upon default BGP routing. In this function, the CDN performs tests to determine a set of alternative best paths between a given edge server and the origin server, and it makes those paths known to the edge server dynamically, typically in the form of a map. When the edge server needs to go forward, it examines the map to determine whether to go forward using default BGP or one of the alternate paths through an intermediate CDN node. This path optimization process is quite useful when the content in question must be generated dynamically, although the process can be used whenever it is necessary for a given edge server to obtain given content from a given source. This performance-based path optimization scheme is described in U.S. Publication No. 2002/0163882, which is also assigned to the assignee of the present application.