The disclosure relates to computer network traffic records. More particularly, the methods and systems described herein relate to distribution and retrieval of network traffic records.
In conventional systems, analyzing computer network traffic records, such as NetFlow or sFlow records, becomes increasingly difficult as traffic volumes grow and as the number of computing devices deployed to perform the analyses increases. Conventional approaches to managing large volumes of data, such as sampling network traffic data instead of collecting each individual network traffic record, do not typically provide sufficient information to perform analysis after the collection of the data has completed. For example, if an administrator attempts to perform a query of network traffic data after data samples were collected and the samples do not include the particular type of data needed to respond to the query, or do not include sufficient data to respond to the query, conventional systems do not provide functionality for accessing the network traffic data at that point.
Conventional systems that provide functionality for capturing an entire body of network traffic data typically require additional computing devices to capture and analyze the data. However, such conventional systems do not typically provide functionality for scalable, efficient distribution of the data or for performing analytical queries across multiple computing devices. For example, unique counts of network entities are particularly difficult to calculate in conventional systems, but are of particular utility to network operators. Examples of powerful queries that are challenging to obtain for an arbitrary timeframe in a conventional system include:    1) ranking the top IP addresses on a network based on the number of other unique IP addresses contacted, indicating potential botnets and scans;    2) ranking the top Autonomous System destinations on a network based on the highest number of unique client IP addresses, to inform routing decisions; and    3) ranking the top IP addresses based highest number of unique destination ports that each IP address has used, indicating potential network reconnaissance.