Technical Field
This application relates generally to distributed data processing systems and to the delivery of content over computer networks.
Brief Description of the Related Art
Distributed computer systems are known in the art. One such distributed computer system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider typically provides the content delivery service on behalf of third parties. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure. The CDN represents an infrastructure shared by multiple third parties, sometimes referred to as multi-tenant infrastructure. Typically, “content delivery” refers to the storage, caching, or transmission of content—such as web pages, streaming media and applications—on behalf of content providers, and ancillary technologies used therewith including, without limitation, DNS query handling, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence.
In a known system such as that shown in FIG. 1, a distributed computer system 100 is configured as a content delivery network (CDN) and is assumed to have a set of machines distributed around the Internet Typically, most of the machines are configured as CDN content servers 102. Such machines may be located near the edge of the Internet, i.e., at or adjacent end user access networks. A network operations command center (NOCC) 104 may be used to administer and manage operations of the various machines in the system. Third party sites affiliated with content providers, such as web site hosted at origin server 106, offload delivery of content (e.g., HTML, embedded page objects, streaming media, software downloads, and the like) to the distributed computer system 100 and, in particular, to the servers 102 (which are sometimes referred to as “edge” servers in light of the possibility that they are near an “edge” of the Internet). Such servers 102 may be grouped together into a point of presence (POP) 107.
Typically, content providers offload their content delivery by aliasing (e.g., by a DNS CNAME) given content provider domains or sub-domains to domains that are managed by the service provider's authoritative domain name service. End user client machines 122 that desire such content may be directed to the distributed computer system to obtain that content more reliably and efficiently. The CDN servers 102 respond to the client requests, for example by obtaining requested content from a local cache, from another CDN server 102, from the origin server 106, or other source.
Although not shown in detail in FIG. 1, the distributed computer system may also include other infrastructure, such as a distributed data collection system 108 that collects usage and other data from the content servers, aggregates that data across a region or set of regions, and passes that data to other back-end systems 110, 112, 114 and 116 to facilitate monitoring, logging, alerts, billing, management and other operational and administrative functions. Distributed network agents 118 monitor the network as well as the server loads and provide network, traffic and load data to a domain name system (DNS) query handling mechanism 115, which is authoritative for content domains being managed by the CDN. A distributed data transport mechanism 120 may be used to distribute control information (e.g., metadata to manage content, to facilitate load balancing, and the like) to the servers 102.
As illustrated in FIG. 2, a given machine 200 in the CDN (e.g., a given CDN server 102) comprises commodity hardware (e.g., an Intel processor) 202 running an operating system kernel (such as Linux® or variant) 204 that supports one or more applications 206. To facilitate content delivery services, for example, given machines typically run a set of applications, such as an HTTP proxy server 207, a name server 208, a local monitoring process 210, a distributed data collection process 212, and the like. The HTTP proxy 207 (sometimes referred to herein as a global host or “ghost”) typically includes a manager process for managing a cache and delivery of content from the machine. For streaming media, the machine typically includes one or more media servers, such as a Windows® Media Server (WMS) or Flash® server, as required by the supported media formats.
The machine shown in FIG. 2 may be configured to provide one or more extended content delivery features, preferably on a domain-specific, content-provider-specific basis, preferably using configuration files that are distributed to the CDN servers 102 using a configuration system. A given configuration file preferably is extensible markup language (XML)-based and includes a set of content handling rules and directives that facilitate one or more advanced content handling features. The configuration file may be delivered to a CDN server 102 via the data transport mechanism 120. U.S. Pat. No. 7,240,100, the contents of which are hereby incorporated by reference, illustrate a useful infrastructure for delivering and managing CDN server content control information and this and other content server control information (referred to as “metadata”) can be provisioned by the CDN service provider itself, or (via an extranet or the like) the content provider customer who operates the origin server 106.
The contents of U.S. Pat. No. 7,111,057, titled “Method and system for purging content from a content delivery network,” are hereby incorporated by reference.
In a typical operation, a content provider identifies a content provider domain or sub-domain that it desires to have served by the CDN. The CDN service provider associates (e.g., via a canonical name, or CNAME, or other aliasing technique) the content provider domain with a CDN hostname, and the CDN provider then provides that CDN hostname to the content provider. When a DNS query to the content provider domain or sub-domain is received at the content provider's domain name servers, those servers respond by returning the CDN hostname. That network hostname points to the CDN, and that hostname is then resolved through the CDN name service. To that end, the CDN name service returns one or more IP addresses. The requesting client browser then makes a content request (e.g., via HTTP or HTTPS) to a CDN server associated with the IP address. The request includes a host header that includes the original content provider domain or sub-domain. Upon receipt of the request with the host header, the server checks its configuration file to determine whether the content domain or sub-domain requested is actually being handled by the CDN. If so, the server applies its content handling rules and directives for that domain or sub-domain as specified in the configuration. As noted above, these content handling rules and directives may be located within an XML-based “metadata” configuration file.
As an overlay, the CDN resources may be used to facilitate wide area network (WAN) acceleration services between enterprise data centers (which may be privately managed) and third party software-as-a-service (SaaS) providers.
CDN customers may subscribe to a “behind the firewall” managed service product to accelerate Intranet web applications that are hosted behind the customer's enterprise firewall, as well as to accelerate web applications that bridge between their users behind the firewall to an application hosted in the Internet ‘cloud’ (e.g., from a SaaS provider). To accomplish these two use cases, CDN software may execute on machines (potentially virtual machines running on customer hardware) hosted in one or more customer data centers, and on machines hosted in remote “branch offices.” The CDN software executing in the customer data center typically provides service configuration, service management, service reporting, remote management access, customer SSL certificate management, as well as other functions for configured web applications. The software executing in the branch offices provides last mile web acceleration for users located there. The CDN itself typically provides CDN hardware hosted in CDN data centers to provide a gateway between the nodes running behind the customer firewall and the service provider's other infrastructure (e.g., network and operations facilities). This type of managed solution provides an enterprise with the opportunity to take advantage of CDN technologies with respect to their company's intranet. This kind of solution extends acceleration for the enterprise to applications served anywhere on the Internet, such as SaaS (Software-As-A-Service) applications. By bridging an enterprise's CDN-based private overlay network with the existing CDN public internet overlay network, an end user at a remote branch office obtains an accelerated application end-to-end.
The CDN may have a variety of other features and adjunct components. For example the CDN may include a network storage subsystem (sometimes referred to herein as “NetStorage”) which may be located in a network datacenter accessible to the CDN servers, such as described in U.S. Pat. No. 7,472,178, the disclosure of which is incorporated herein by reference. The CDN may operate a server cache hierarchy to provide intermediate caching of customer content; one such cache hierarchy subsystem is described in U.S. Pat. No. 7,376,716, the disclosure of which is incorporated herein by reference.
For live streaming delivery, the CDN may include a live delivery subsystem, such as described in U.S. Pat. No. 7,296,082, and U.S. Publication No. 2011/0173345, the disclosures of which are incorporated herein by reference.
As noted above, when a given content server in a CDN (or in another distributed computing platform) receives a content request, it typically needs to have information about the identity and characteristics of the requested objects, as well as information about the features of the CDN that should be invoked when delivering those objects. In short, the CDN (and by extension its constituent servers) needs to have information about how to handle a content request. Such information, referred to as “metadata,” can be distributed in configuration files to the content servers, as noted above with respect to U.S. Pat. No. 7,240,100.
However, as a CDN grows and its feature set diversifies, the volume and complexity of metadata rises significantly. Moreover, the continuing move to cloud providers, platform as a service (PaaS), infrastructure as a service (IaaS), and/or software as a service (SaaS), further complicates the management of metadata for a CDN. Improved approaches for delivering and managing metadata are necessary to meet such challenges.