1. Technical Field
This application generally relates to client-server communications and the delivery of content over computer networks, and more particularly to the identification and/or characterization of client devices that are requesting content over computer networks.
2. Brief Description of the Related Art
The client-server model for obtaining content over a computer network is well-known in the art. In a typical system, such as that shown in FIG. 1A, a content provider manages or otherwise arranges for a server that hosts particular content (e.g., website content). A client device makes a request for a given piece of content (e.g., an html document defining a page on the web site) over a computer network. The server can respond to the client device by sending the requested content.
It also known in the art to use distributed computer systems to deliver content to client devices. One such distributed computer system is a “content delivery network” or “CDN” that is operated and managed by a service provider. The service provider typically provides the content delivery service on behalf of third party content providers. A “distributed system” of this type typically refers to a collection of autonomous computers linked by a network or networks, together with the software, systems, protocols and techniques designed to facilitate various services, such as content delivery or the support of outsourced site infrastructure. Typically, “content delivery” refers to the storage, caching, or transmission of content—such as web pages, streaming media and applications—on behalf of content providers, and ancillary technologies used therewith including, without limitation, DNS query handling, provisioning, data monitoring and reporting, content targeting, personalization, and business intelligence.
In a known system such as that shown in FIG. 1B, a distributed computer system 100 is configured as a content delivery network (CDN) and has a set of machines 102 distributed around the Internet. Typically, most of the machines are servers located near the edge of the Internet, i.e., at or adjacent end user access networks. A network operations command center (NOCC) 104 may be used to administer and manage operations of the various machines in the system. Third party sites affiliated with content providers, such as web site 106, offload delivery of content (e.g., HTML, embedded page objects, streaming media, software downloads, and the like) to the distributed computer system 100 and, in particular, to the servers (which are sometimes referred to as content servers, or sometimes as “edge” servers in light of the possibility that they may be near an “edge” of the Internet, or sometimes as proxy servers if running a proxy application, as described in more detail below; none of these terms are mutually exclusive). Such servers may be grouped together into a point of presence (POP) 107.
Typically, content providers offload their content delivery by aliasing (e.g., by a DNS CNAME or otherwise) given content provider domains or sub-domains to domains that are managed by the service provider's authoritative domain name service. End user client machines 122 that desire such content may be directed to the distributed computer system to obtain that content more reliably and efficiently. The servers 102 respond to the client requests by obtaining requested content from a local cache, from another content server, from the origin server 106, or other source, for example.
Although not shown in detail in FIG. 1, the distributed computer system may also include other infrastructure, such as a distributed data collection system 108 that collects usage and other data from the content servers, aggregates that data across a region or set of regions, and passes that data to other back-end systems 110, 112, 114 and 116 to facilitate monitoring, logging, alerts, billing, management and other operational and administrative functions. Distributed network agents 118 monitor the network as well as the server loads and provide network, traffic and load data to a DNS query handling mechanism 115, which is authoritative for content domains being managed by the CDN. A distributed data transport mechanism 120 may be used to distribute control information (e.g., metadata to manage content, to facilitate load balancing, and the like) to the content servers.
As illustrated in FIG. 2, a given machine 200 in the CDN (sometimes referred to as an “edge machine”) comprises commodity hardware (e.g., a processor) 202 running an operating system kernel (such as Linux or variant) 204 that supports one or more applications 206. To facilitate content delivery services, for example, given machines typically run a set of applications, such as an HTTP proxy 207, a name server 208, a local monitoring process 210, a distributed data collection process 212, and the like. The HTTP proxy 207 (sometimes referred to herein as a global host or “ghost” process) typically includes a manager process for managing a cache and delivery of content from the machine. For streaming media, the machine typically includes one or more media servers, such as a Windows Media Server (WMS) or Flash server, as required by the supported media formats.
The machine shown in FIG. 2 may be configured to provide one or more extended content delivery features, preferably on a domain-specific, customer-specific basis, preferably using configuration files that are distributed to the content servers using a configuration system. A given configuration file preferably is XML-based and includes a set of content handling rules and directives that facilitate one or more advanced content handling features. The configuration file may be delivered to the CDN server via the data transport mechanism. U.S. Pat. Nos. 7,240,100 and 7,111,057, the teachings of which are incorporated herein by reference, illustrate a useful infrastructure for delivering and managing CDN server content control information and this and other content server control information (sometimes referred to as “metadata”) can be provisioned by the CDN service provider itself, or (via an extranet or the like) the content provider customer who operates the origin server.
The CDN may include a storage subsystem (sometimes referred to as “NetStorage”) which may be located in a network datacenter accessible to the content servers, such as described in U.S. Pat. No. 7,472,178, the disclosure of which is incorporated herein by reference. The CDN may operate a server cache hierarchy to provide intermediate caching of customer content; one such cache hierarchy subsystem is described in U.S. Pat. No. 7,376,716, the disclosure of which is incorporated herein by reference. For live streaming delivery, the CDN may include a live delivery subsystem, such as described in U.S. Pat. No. 7,296,082, and U.S. Publication No. 2011/0173345, the disclosures of which are incorporated herein by reference.
Whether content is delivered directly as in FIG. 1A or via the CDN in FIG. 1B, servers are being called upon to deliver content to an increasingly diverse array of client devices and environments. More and more, end-users consume content using devices other than the conventional desktop PC. Smartphones, tablets and other mobile devices, as well as televisions, conferencing systems, gaming systems, and other connected devices are being used more and more to consume online content.
The proliferation of client devices means that the display features, form factors, functional capabilities, and other characteristics thereof are becoming much more diverse. Online content providers want to be able to deliver content effectively and efficiently to this increasing array of clients in a way that is situationally-aware. To optimize the end user experience, a given server (in the CDN or otherwise) preferably is able to understand the capabilities, limitations, and other attributes of the client device that is requesting content from it. The server can then act appropriately for the particular device—for example, sending images appropriately sized for the client device's screen, or filtering content sent to the client so that incompatible content is not delivered to the client. Hence, there is a need for a server to be able discern information about a requesting client in a rapid fashion, accurately, at scale, and while accommodating a non-uniform and ever-expanding universe of new clients.
The teachings herein address these and other needs and offer other features and advantages that will become apparent in view of this disclosure.