This disclosure relates in general to content delivery networks (CDNs) and, but not by way of limitation, to gathering analytics relating to the operation of the CDN across a number of points of presence (POPs).
Analytics are used in a CDN at the most basic level for accounting of use by customers so that they may be billed accurately. Richer information is gathered in log files that are aggregated from the various POPs and stored for query or provided back to customers. Queries against the log files are possible, but their sheer size makes that difficult and not practical in real time. The log files are delayed by hours as the information is gathered from all the POPs and aggregated centrally.
The amount of analytics data gathered by a CDN is skyrocketing. Many terabytes are produced every day. An ever-increasing part of the bandwidth and processing resources for the CDN are consumed by this process. Much of the analytics data is gathered without ever being used. All the resources being consumed for information that is underutilized is inefficient. If current trends continue, analytics data will be a large part of the cost for delivery of content through the Internet.