Packetized data networks are in widespread use transporting data throughout the world. Packetized data networks typically format data into packets for transmission between one computer and another. These packets include headers containing information relating to packet data and routing. The data networks send these packets from the originating computer to the destination computers using routers which send the packet on to the destination computer using the routing information in the packet header. A flow of packets are a group of packets sent from a particular source network address and port to a destination network address and port. These particular destination source network addresses and ports may, for example, correspond to different computers.
Netflow is a known network protocol which may be used for collecting and monitoring Internet Protocol (IP) traffic. Some netflow analyzing engines keep only the top number (normally up to 1000) of source, destination IP addresses based solely on the volume of data associated with the IP address. The disadvantage of filtering the information in this manner is that the actual flow information is lost, in particular the context of the resulting information (e.g., the source and destination ports associated with the source and destination IP addresses). Hence, such engines are unable to identify attacks as all distribution information is lost. Other netflow analyzing engines retain only a subset of the flows (normally around 10,000 flows in a time period of one hour). The subset is normally based on the flow octet size. This technique reduces the storage required for flows while, in contrast to the technique outlined above, still retaining some distribution information.
The so-called “conversations” are the transmissions between the particular source and destination IP addresses and ports which are typically stored and sorted in order to establish the respective associations between them. If the conversation count is large, a high volume of traffic is demonstrated which could indicate a port scanning attack. Previously, a single system for processing traffic flow data received from a network probe device would count (up to a limit) the number of conversations, without providing an actual count of conversations beyond that limit. This approach breaks down in an integrated distributed environment comprising a plurality of netflow collectors as the processing system runs the risk of double counting conversations that have been seen by multiple netflow collectors.
It is desired to implement methods and procedures that overcome these problems by providing a technique for efficient processing and analysis of netflow data in a distributed environment.