This disclosure relates in general to network analytics and, but not by way of limitation, to processing network analytics data.
Content and applications are delivered with the Internet. For example, web pages are delivered to end users with embedded scripts and/or applets. Content providers deliver their content using a variety of hosting services, content delivery networks (CDNs) and their own origin servers. Paramount to providing a desired experience to end users, content providers wish to track analytics that reflect how the content and applications are being delivered.
Delivery patterns and performance are of interest so that content providers can monitor usage and quality of service (QoS). For example, a content provider might modify a web page and wish to see if end users react favorably to the modified web page. Gathering analytics is provided by web performance services who maintain large databases. These services generate reports periodically by querying the large databases, but not close to real time. Queries to the gathered analytics can be performed, but takes 4-6 hours to generate a report as the databases are so large.
Solid state drives (SSDs) and flash memory is becoming more common as a spinning disk replacement for non-volatile storage. Seek times for SSD, flash memory and RAM are considerably faster than for spinning media. Conventional metric gathering has used spinning media with the algorithms and architectures that best leverage spinning media with its large seek times. There are SSDs that have interfaces similar to hard drives (i.e., SATA, PATA, SCSI or SAS) or computer bus interfaces (e.g., miniPCI or PCIe). Flash memory is sometimes integrated into a hardware server without using an expansion bus or interface, for example, through placement on the motherboard or a dual in-line memory module (DIMM) connector. Random access memory (RAM) is common in hardware servers, but often is a limited resource.
Computer architectures interface with volatile memory in different ways. There are tradeoffs between the different data structures commonly used by operating systems and/or processors when allocating memory. A heap data structure is very flexible in accommodating different data types, but is relatively slow. A stack data structure adds or removes data in a last-in-first-out manner only accommodating integer data types, but is typically far faster than heap allocation. Databases tend to be slow because they rely on heap data structures.
As mentioned above, the heap is substantially slower than use of the stack. Databases extensively use the heap relying on reference types. For large datasets, database queries can take hours to fulfill. Additionally, distributed databases are difficult to reconcile to maintain coherency. Periodic updates with distributed database require processing resources and bandwidth. Metric processing operates on huge datasets and the databases are slow to query.