Content delivery networks (CDNs) have greatly improved the way content is transferred across data networks such as the Internet. A CDN accelerates the delivery of content by reducing the distance that content travels in order to reach a destination. To do so, the CDN strategically locates surrogate origin servers, also referred to as caching servers or edge servers, at various points-of-presence (POPs) that are geographically proximate to large numbers of content consumers and the CDN utilizes a traffic management system to route requests for content hosted by the CDN to the edge server that can optimally deliver the requested content to the content consumer. Determination of the optimal edge server may be based on geographic proximity to the content consumer as well as other factors such as load, capacity, and responsiveness of the edge servers. The optimal edge server delivers the requested content to the content consumer in a manner that is more efficient than when origin servers of the content publisher deliver the requested content. For example, a CDN may locate edge servers in Los Angeles, Dallas, and New York. These edge servers may cache content that is published by a particular content publisher with an origin server in Miami. When a content consumer in San Francisco submits a request for the published content, the CDN will deliver the content from the Los Angeles edge server on behalf of the content publisher as opposed to the much greater distance that would be required when delivering the content from the origin server in Miami. In this manner, the CDN reduces the latency, jitter, and amount of buffering that is experienced by the content consumer. The CDN also allows the content publisher to offload infrastructure, configuration, and maintenance costs while still having the ability to rapidly scale resources as needed. Content publishers can therefore devote more time to the creation of content and less time to the creation of an infrastructure that delivers the created content to the content consumers.
As a result of these and other benefits, many different CDNs are in operation today. Edgecast, Akamai, Limelight, and CDNetworks are some examples of operating CDNs that are responsible for the delivery of terabytes worth of content. FIG. 1 illustrates a representative infrastructure for some such CDNs. As shown in FIG. 1, the infrastructure includes a distributed set of edge servers 110, traffic management servers 120, and an administrative server 130. The figure also illustrates the interactions that CDN customers including content publishers have with the CDN and interactions that content consumers or end users have with the CDN.
Each edge server of the set of edge servers 110 may represent a single physical machine or a cluster of machines. The cluster of machines may include a server farm for a geographically proximate set of physically separate machines or a set of virtual machines that execute over partitioned sets of resources of one or more physically separate machines. The set of edge servers 110 are distributed across different edge regions of the Internet to facilitate the “last mile” delivery of content. The edge servers run various processes that (1) manage what content is cached, (2) how content is cached, (3) how content is retrieved from the origin server when the content is not present in cache, (4) monitor server capacity (e.g., available processor cycles, available memory, available storage, etc.), (5) monitor network performance (e.g., latency, downed links, etc.), and (6) report statistical data on the delivered content. The set of edge servers 110 may provide the monitoring information to the traffic management servers 120 to facilitate the routing of content consumers to the optimal edge servers. The set of edge servers 110 may provide the statistical data to the administrative server 130 where the data is aggregated and processed to produce performance reports for the delivery of the customers' content.
The traffic management servers 120 route content consumers, and more specifically, content consumer issued requests for content to the one or more edge servers. Different CDN implementations utilize different traffic management schemes to achieve such routing to the optimal edge servers. Consequently, the traffic management servers 120 can include different combinations of Doman Name System (DNS) servers, load balancers, and routers performing Anycast or Border Gateway Protocol (BGP) routing. For example, some CDNs utilize the traffic management servers 120 to provide a two-tiered DNS routing scheme, wherein the first DNS tier resolves a DNS request to the CDN region (or POP) that is closest to the requesting content consumer and the second DNS tier resolves the DNS request to the optimal edge server in the closest CDN region. As another example, some CDNs use Anycast routing to identify the optimal edge server.
The administrative server 130 may include a central server of the CDN or a distributed set of interoperating servers that perform the configuration control and reporting functionality of the CDN. The reporting functionality may include deriving performance reports, analytics, billing data, and raw data for customers of the CDN based on server logs that are aggregated from the set of edge servers 110 that record detailed transactions performed by each of the servers 110.
Existing log reporting functionality includes proprietary systems and methods that have been developed independently by the CDNs or that include third party systems that have been customized for specific CDNs. Typically, the software and hardware for these systems and methods are incompatible, thus creating a barrier to CDN federation.
CDN federation is advocated by EdgeCast Networks Inc. of Santa Monica, Calif. as a means for providing dynamic CDN scalability, providing a larger global CDN footprint, and increasing utilization of a CDN operator's capacity by making some or all of that capacity available to multiple CDN service providers who then, in turn, can realize advantages of a CDN without the need to develop the optimized software and without the need to deploy the infrastructure necessary to operate a CDN. The Open CDN platform conceived by EdgeCast Networks Inc. is a federation of independently operated CDNs. The CDNs participating in the federation can exchange capacity with one another such that CDNs with excess capacity can avoid the sunk costs associated with capacity going unused by selling that capacity to CDNs that are in need of additional capacity. The capacity sold by a CDN seller can then be configured and used for purposes of a CDN buyer.
However, the incompatible proprietary or customized log reporting systems and methods of each independently operated CDN is a barrier to such a federation and the advantages that can be realized from such a federation. Specifically, the incompatibility prevents the accurate and comprehensive reporting for a customer configuration that (1) is offloaded from a native CDN, or the CDN to which the customer belongs, to a foreign CDN or (2) is simultaneously deployed to capacity of two different CDNs. This is because the native CDN that reports to the customer does not have access to the logs produced by the servers of the foreign CDN to which the customer's configuration was deployed. Furthermore, even if the native CDN had access to the logs of the foreign CDN, the logs may include proprietary identifiers and may be formatted differently than those identifiers and formats used by the native CDN. The customer therefore has no easily available means of ascertaining how his/her configuration is performing in the federation. The simple solution of having the customer decipher logs from disparate reporting systems of different CDNs is unacceptable. Furthermore, there is no obvious solution for integrating logs from different CDNs as the participants in the federation can change over time, the pair of CDNs that exchange logs can change with each customer configuration deployed across the federation, and each CDN may have its own proprietary or customized set of reporting software and hardware.
Accordingly, there is a need to provide enhanced log reporting systems and methods that are able to provide logs and performance reports, statistical data, and billing information derived from the logs for a customer that has its configuration deployed across a federations of CDNs. To do so, there is a need for the log reporting systems and methods to aggregate logs from the independently operated participants of the federation. There is further a need to process the aggregated logs in order to accurately and comprehensively convey the performance of any particular customer configuration irrespective of which server resources of which federation participants that particular customer configuration is deployed to. There is also a need to provide log reporting systems and methods that are scalable to support the sizeable amount of raw data that is reported within the logs aggregated from each of multiple federation participants as a reporting system for a single CDN participant may aggregate and process over a billion lines of logs every hour.