Technical Field
This application relates generally to distributed data processing systems and to the delivery of content over computer networks.
Brief Description of the Related Art
Distributed computer systems are known in the art. One such distributed computer system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider typically provides the content delivery service on behalf of third parties. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure. This infrastructure is shared by multiple tenants, typically content providers. The infrastructure is generally used for the storage, caching, or transmission of content—such as web pages, streaming media and applications—on behalf of such content providers or other tenants. The platform may also provide ancillary technologies used therewith including, without limitation, DNS query handling, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence.
In a known system such as that shown in FIG. 1, a distributed computer system 100 is configured as a content delivery network (CDN) and has a set of machines, including CDN servers 102, distributed around the Internet. Typically, most of the servers are located near the edge of the Internet, i.e., at or adjacent end user access networks. A network operations command center (NOCC) 104 may be used to administer and manage operations of the various machines in the system. Third party sites affiliated with content providers, such as web site hosted on origin server 106, offload delivery of content (e.g., HTML or other markup language files, embedded page objects, streaming media, software downloads, and the like) to the distributed computer system 100 and, in particular, to the CDN servers 102 (which are sometimes referred to as content servers, or sometimes as “edge” servers in light of the possibility that they are near an “edge” of the Internet). Such servers may be grouped together into a point of presence (POP) 107 at a particular geographic location.
The CDN servers 102 are typically located at nodes that are publicly-routable on the Internet, within or adjacent nodes that are located in mobile networks, in or adjacent enterprise-based private networks, or in any combination thereof.
Typically, content providers offload their content delivery by aliasing (e.g., by a DNS CNAME) given content provider domains or sub-domains to domains that are managed by the service provider's authoritative domain name service. The service provider's domain name service directs end user client machines 122 that desire content to the distributed computer system (or more particularly, to one of the CDN servers in the platform) to obtain the content more reliably and efficiently. The CDN servers respond to the client requests, for example by serving requested content from a local cache, or by fetching it from another CDN server, from the origin server 106 associated with the content provider, or other source, and sending it to the requesting client.
For cacheable content, CDN servers 102 typically employ on a caching model that relies on setting a time-to-live (TTL) for each cacheable object. After it is fetched, the object may be stored locally at a given CDN server until the TTL expires, at which time is typically re-validated or refreshed from the origin server 106. For non-cacheable objects (sometimes referred to as ‘dynamic’ content), the CDN server typically must return to the origin server 106 when the object is requested by a client. The CDN may operate a server cache hierarchy to provide intermediate caching of customer content in various CDN servers closer to the CDN server handling a client request than the origin server 106; one such cache hierarchy subsystem is described in U.S. Pat. No. 7,376,716, the disclosure of which is incorporated herein by reference.
Although not shown in detail in FIG. 1, the distributed computer system may also include other infrastructure, such as a distributed data collection system 108 that collects usage and other data from the CDN servers 102, aggregates that data across a region or set of regions, and passes that data to other back-end systems 110, 112, 114 and 116 to facilitate monitoring, logging, alerts, billing, management and other operational and administrative functions. Distributed network agents 118 monitor the network as well as the server loads and provide network, traffic and load data to a DNS query handling mechanism 115. A distributed data transport mechanism 120 may be used to distribute control information (e.g., metadata to manage content, to facilitate load balancing, and the like) to the CDN servers. The CDN may include a network storage subsystem (sometimes referred to herein as “NetStorage”) which may be located in a network datacenter accessible to the CDN servers and which may act as a source of content, such as described in U.S. Pat. No. 7,472,178, the disclosure of which is incorporated herein by reference.
As illustrated in FIG. 2, a given machine 200 in the CDN comprises commodity hardware (e.g., a microprocessor) 202 running an operating system kernel (such as Linux® or variant) 204 that supports one or more applications 206. To facilitate content delivery services, for example, given machines typically run a set of applications, such as an HTTP proxy 207, a name service 208, a local monitoring process 210, a distributed data collection process 212, and the like. The HTTP proxy 207 typically includes a manager process for managing a cache and delivery of content from the machine. For streaming media, the machine may include one or more media servers, such as a Windows® Media Server (WMS) or Flash server, as required by the supported media formats.
A given CDN server 102 shown in FIG. 1 may be configured to provide one or more extended content delivery features, preferably on a domain-specific, content-provider-specific basis, preferably using configuration files that are distributed to the CDN servers using a configuration system. A given configuration file preferably is XML-based and includes a set of content handling rules and directives that facilitate one or more advanced content handling features. The configuration file may be delivered to the CDN server via the data transport mechanism. U.S. Pat. No. 7,240,100, the contents of which are hereby incorporated by reference, describe a useful infrastructure for delivering and managing CDN server content control information and this and other control information (sometimes referred to as “metadata”) can be provisioned by the CDN service provider itself, or (via an extranet or the like) the content provider customer who operates the origin server. U.S. Pat. No. 7,111,057, incorporated herein by reference, describes an architecture for purging content from the CDN. More information about a CDN platform can be found in U.S. Pat. Nos. 6,108,703 and 7,596,619, the teachings of which are hereby incorporated by reference in their entirety.
In a typical operation, a content provider identifies a content provider domain or sub-domain that it desires to have served by the CDN. When a DNS query to the content provider domain or sub-domain is received at the content provider's domain name servers, those servers respond by returning the CDN hostname (e.g., via a canonical name, or CNAME, or other aliasing technique). That network hostname points to the CDN, and that hostname is then resolved through the CDN name service. To that end, the CDN name service returns one or more IP addresses. The requesting client application (e.g., browser) then makes a content request (e.g., via HTTP or HTTPS) to a CDN server machine associated with the IP address. The request includes a host header that includes the original content provider domain or sub-domain. Upon receipt of the request with the host header, the CDN server checks its configuration file to determine whether the content domain or sub-domain requested is actually being handled by the CDN. If so, the CDN server applies its content handling rules and directives for that domain or sub-domain as specified in the configuration. These content handling rules and directives may be located within an XML-based “metadata” configuration file, as mentioned previously.
The CDN platform may be considered an overlay across the Internet on which communication efficiency can be improved. Improved communications on the overlay can help when a CDN server needs to obtain content from an origin server, or otherwise when accelerating non-cacheable content for a content provider customer. Communications between CDN servers and/or across the overlay may be enhanced or improved using intelligent route selection, protocol optimizations including TCP enhancements, persistent connection reuse and pooling, content & header compression and de-duplication, and other techniques such as those described in U.S. Pat. Nos. 6,820,133, 7,274,658, 7,607,062, and 7,660,296, among others, the disclosures of which are incorporated herein by reference.
As an overlay offering communication enhancements and acceleration, the CDN resources may be used to facilitate wide area network (WAN) acceleration services between enterprise data centers and/or between branch-headquarter offices (which may be privately managed), as well as to/from third party software-as-a-service (SaaS) providers used by the enterprise users.
Along these lines, CDN customers may subscribe to a “behind the firewall” managed service to accelerate Intranet web applications that are hosted behind the customer's enterprise firewall, as well as to accelerate web applications that bridge between their users behind the firewall to an application hosted in the Internet cloud (e.g., from a SaaS provider).
To accomplish these two use cases, CDN software may execute on machines (potentially in virtual machines running on customer hardware) hosted in one or more customer data centers, and on machines hosted in remote “branch offices.” The CDN software executing in the customer data center typically provides service configuration, service management, service reporting, remote management access, customer SSL certificate management, as well as other functions for configured web applications. The software executing in the branch offices provides last mile web acceleration for users located there. The CDN itself typically provides CDN hardware hosted in CDN data centers to provide a gateway between the nodes running behind the customer firewall and the CDN service provider's other infrastructure (e.g., network and operations facilities). This type of managed solution provides an enterprise with the opportunity to take advantage of CDN technologies with respect to their company's intranet, providing a wide-area-network optimization solution. This kind of solution extends acceleration for the enterprise to applications served anywhere on the Internet. By bridging an enterprise's CDN-based private overlay network with the existing CDN public internet overlay network, an end user at a remote branch office obtains an accelerated application end-to-end. More information about a behind the firewall service offering can be found in U.S. Pat. No. 7,600,025, the teachings of which are hereby incorporated by reference.
For live streaming delivery, the CDN may include a live delivery subsystem, such as described in U.S. Pat. No. 7,296,082, and U.S. Publication Nos. 2011/0173345 and 2012/0265853, the disclosures of which are incorporated herein by reference.
A CDN service provider typically takes a variety of measures to secure the storage and delivery of customer content while it resides on the CDN platform. A CDN typically supports industry standard communication technologies such as IPSec and/or secure socket layer/transport security layer (SSL/TLS) for communication between a requesting client and a CDN content server, and between machines in the CDN itself. However, to handle a given content provider's traffic, the CDN usually needs to have access to the content provider's SSL/TLS private keys and certificates, which some content providers dislike. Moreover, sometimes a content provider simply has certain content it is not willing to send over the CDN platform because the CDN is a third party.
As a result, a content provider may want to serve certain content (e.g., private/sensitive content) from their own origin infrastructure, while having other less sensitive content served through a CDN service and its infrastructure, so as to obtain a delivery performance benefit at least for that content. By splitting the traffic in this way, the customer can potentially avoid the need for the CDN to have the customer's SSL keys in the CDN network, avoid the possibility of customer-owned personal identifying information (PII) flowing through the CDN, and mitigate the risk of data corruption in the event of CDN breach (e.g., defacement of content, security breach, or otherwise).
Current techniques for configuring the origin and CDN to split traffic in this way have risk of misconfiguration and are subject to human error. Improved techniques for serving content as described herein can reduce if not eliminate the possibility of sensitive or PII data flowing across the CDN, while also reducing if not eliminating the chance that a compromise of the CDN (either through hacking, defacement, or other security issue, etc. or otherwise) will leak PII or negatively impact the customer's content.
Presented herein are systems, methods and apparatus that address these issues and that provide other features and benefits that will become apparent in view of the teachings hereof.