Technical Field
This patent document relates generally to distributed data processing systems, to content delivery mechanisms and content caching, and to network security.
Brief Description of the Related Art
Content caching is well-known in the art. By caching content in an intermediary machine such as a cache server that is located in between a client and a content source such as an origin server, and directing the client to or through the intermediary, the delivery of content can be accelerated. Caching of web content is ubiquitous and caching mechanisms have been well-described in the field, see for example Request for Comments (RFCs) 2616 and 7234 of the Internet Engineering Task Force (IETF).
In general, an intermediary machine with an object cache and running a hypertext transfer protocol (HTTP) proxy server application operates as follows: a client request is received for particular web object, such as an HTML document or an image; the intermediary machine determines a cache key, typically based on the requested URL; the machine looks in its cache using the key to see if it has the object (a cache hit) and if so checks whether the object is valid (e.g., it is not expired); if the object is found and valid, it can be served to the client; if the object is not found in the cache or the object is found but it is expired (a cache miss), the machine must go back to an origin server, or in some cases another intermediary, to get the requested object or to revalidate the expired copy. Once the object is received or revalidated, it can be stored in the local cache for an amount of time (defined by a time to live or ‘TTL’ value) and served in response to subsequent client requests.
A variety of types of intermediary object caches are known in the art, such as transparent cache servers, and forward and reverse proxy cache servers.
Caching proxy servers are often used as the building blocks of distributed computer systems known as “content delivery networks” or “CDNs” that are operated and managed by a service provider. The service provider typically provides the content delivery service on behalf of third parties. The CDN infrastructure is shared by multiple tenants, the content providers. The infrastructure is generally used for the storage, caching, or transmission of content on behalf of content providers or other such tenants.
In a known system such as that shown in FIG. 1, a distributed computer system 100 is configured as a CDN and has a set of servers 102 distributed around the Internet. Typically, most of the servers are located near the edge of the Internet, i.e., at or adjacent end user access networks. A network operations command center (NOCC) 104 may be used to administer and manage operations of the various machines in the system. Third party sites affiliated with content providers, such as origin 106 hosting a web site, offload delivery of objects (e.g., HTML or other markup language files, embedded page objects, streaming media, software downloads, and the like) to the distributed computer system 100 and, in particular, to the CDN servers (which are, as noted, caching proxy servers). The CDN servers 102 may be grouped together into a point of presence (POP) 107 at a particular geographic location.
The CDN servers are typically located at nodes that are publicly-routable on the Internet, in end-user access networks, in peering points, within or adjacent nodes that are located in mobile networks, in or adjacent enterprise-based private networks, or in any combination thereof.
Content providers offload their content delivery by aliasing (e.g., by a DNS CNAME) given content provider domains or sub-domains to domains that are managed by the service provider's authoritative domain name service. The service provider's domain name service directs end user client machines 122 that desire content to the distributed computer system (or more particularly, to one of the CDN servers 102 in the platform) to obtain the content more reliably and efficiently. The CDN servers 102 respond to the client requests, for example by fetching requested content from a local cache, from another CDN server 102, from the origin server 106 associated with the content provider, or other source, and serving it to the requesting client.
For cacheable content, CDN servers 102 typically employ on a caching model that relies on setting a TTL for each cacheable object. After it is fetched, the object may be stored locally at a given CDN server 102 until the TTL expires, at which time is typically re-validated or refreshed from the origin server 106. For non-cacheable objects (sometimes referred to as ‘dynamic’ content), the CDN server 102 typically returns to the origin server 106 time when the object is requested by a client. The CDN may operate a server cache hierarchy to provide intermediate caching of customer content in various CDN servers 102 that are between the CDN server 102 handling a client request and the origin server 106; one such cache hierarchy subsystem is described in U.S. Pat. No. 7,376,716, the disclosure of which is incorporated herein by reference.
Although not shown in detail in FIG. 1, the distributed computer system may also include other infrastructure, such as a distributed data collection system 108 that collects usage and other data from the CDN servers 102, aggregates that data across a region or set of regions, and passes that data to other back-end systems 110, 112, 114 and 116 to facilitate monitoring, logging, alerts, billing, management and other operational and administrative functions. Distributed network agents 118 monitor the network as well as the server loads and provide network, traffic and load data to a domain name service (DNS) query handling mechanism 115. A distributed data transport mechanism 120 may be used to distribute control information (e.g., metadata to manage content, to facilitate load balancing, and the like) to the CDN servers 102.
As illustrated in FIG. 2, a given machine 200 in the CDN comprises commodity hardware (e.g., a microprocessor) 202 running an operating system kernel (such as Linux® or variant) 204 that supports one or more applications 206. To facilitate content delivery services, for example, given machines typically run a set of applications, such as an HTTP proxy 207, a name service 208, a local monitoring process 210, a distributed data collection process 212, and the like. The HTTP proxy 207 typically includes a manager process for managing an object cache and delivery of content from the machine. The object cache may reside in volatile or non-volatile memory, represented by hardware 202 in FIG. 3.
A given CDN server 102 shown in FIG. 1 may be configured to provide one or more extended content delivery features, preferably on a domain-specific, content-provider-specific basis, preferably using configuration files that are distributed to the CDN servers 102 using a configuration system. A given configuration file preferably is XML-based and includes a set of content handling rules and directives that facilitate one or more advanced content handling features. The configuration file may be delivered to a particular CDN server 102 via the data transport mechanism. U.S. Pat. No. 7,240,100, the disclosure of which is hereby incorporated by reference, describe a useful infrastructure for delivering and managing such content control information. This and other control information (sometimes referred to as “metadata”) can be provisioned by the CDN service provider itself, or by the content provider customer who operates the origin server 106. More information about CDN platforms can be found in U.S. Pat. Nos. 6,108,703 and 7,596,619, the disclosures of which are hereby incorporated by reference in their entireties.
It is an object of the teachings hereof to leverage object caches, such as those described above in CDNs, to record and compile information about web traffic, and to perform rate accounting on client request traffic. This information can be used to detect malicious or undesirable activity. It is a further object of the teachings hereof to enable monitoring of web traffic across a plurality of object caches. Further advantages, benefits, and uses of the teachings hereof will be apparent from the description below and the appended drawings.