A network flow may be defined as a set of packets passing a monitoring point in a network during a certain period of time. The monitoring point may be a particular data source such as an interface of a network device, for example. All of the packets belonging to a particular flow share a set of common characteristics. The characteristics may be ascertained by examining the packet itself and may include a source IP address, a destination IP address, a source port, a destination port, a protocol, a service to be performed on the packet or any other packet characteristic. By choosing the characteristics defining a particular flow, the particular flow may be defined somewhere between all network traffic observed at the monitoring point or a single packet sent between network applications observed at the monitoring point.
The collection of network flow data may provide valuable information about the overall network traffic. For example, a set of sample packets may be analyzed to estimate characteristics of the overall network traffic. This may be done without collecting network flow data for every packet observed at the monitoring point, which would be a difficult task due to the high volume of network traffic. The network flow data may provide information that may be used to make decisions regarding network traffic engineering, the provision of network services, billing based on network usage, etc.
There are many network packet sampling schemes available. Packet sampling schemes are designed to be random and to prevent synchronization with any specific network traffic pattern. Two example network sampling schemes are packet-based sampling and time-based sampling. Packet-based sampling schemes are designed to sample 1 out of every N packets (i.e., 1-in-N sampling) observed at the monitoring point. Systematic, multi-stage and simple random sampling are example schemes that are well known in the art. Time-based sampling schemes are designed to sample N packets in every predetermined time interval. Time-based sampling schemes may be designed to achieve a desired sample rate such as 1-in-N sampling. Time-based sampling schemes are simple and commonly implemented on the application-specific IC (“ASIC”) of network devices. For example, the maximum expected packet rate on a 1 Gigabit Ethernet port is 1.488 Mpps (assuming a 64 byte packet size). Thus, to achieve 1-in-32 sampling, forty-seven packets must be sampled every millisecond. When the actual packet rate is less than maximum, however, time-based sampling schemes result in extra packets being sampled (i.e., oversampling). In particular, time-based sampling schemes fail to account for the observed rate of network traffic. Oversampling creates undesired consequences such as reducing the accuracy of the estimated characteristics of the overall network traffic and/or placing an unnecessary burden on the network resources.