1. Field of the Invention
The present invention relates to data transfer between nodes in a communications network, and, more particularly, to generating estimates of per-flow traffic through the network.
2. Description of the Related Art
Accurate measurement of traffic in a packet network is an important component of traffic management, accounting, denial of service (DoS) detection, and traffic engineering. The traffic in the network might be classified into network flows, with traffic measurements performed on a per-flow basis. The definition of a network flow varies depending on the application. For example, flows might be characterized by the 5-tuples in the IP packet header (e.g., source/destination ports/addresses), by the specific destination (e.g., node or network characterized by a destination address prefix), or by the source network. For virus or worm detection, a flow might also be defined as packets containing a specific worm signature. For this expanded definition of a flow, checking whether a packet belongs to a particular flow is an expensive operation in terms of network resources. Thus, it is desirable to avoid performing this operation for flow rate measurements on every packet.
One prior art approach used for measuring traffic is to sample the traffic arriving at a node (e.g., at the node's router), maintain a count of traffic arrivals on a per-flow basis, and then estimate the per-flow traffic based on this traffic-arrival count. However, for a large number of flows, this prior art approach requires considerable memory and processing resources to maintain the per-flow traffic-arrival counts. In some cases, as many as 0.5-1.0 million flows might be present in a backbone packet network. Since measurement of per-flow traffic has several applications in real-time traffic management, billing, and network security, accurate per-flow rate information should be obtained efficiently without maintaining per-flow states for all flows traversing a router or a network link.
Some particularly important measurement applications are for DoS, active queue management, and virus/worm detection. For DoS applications, a sudden increase in traffic flow toward a given destination might signal the onset of a DoS attack. An estimation might be employed to determine that the traffic at a network node is anomalous, triggering an alarm and activating more-detailed monitoring of the suspect flow (traffic stream). For active queue management, per-flow measurements allow for queuing fairness in networks. Isolating large flows of mis-behaving sources reduces their impact on the rest of the flows in the network, especially for uncontrolled flows of open-loop user data protocol (UDP) sources or transmission control protocol (TCP) sources exhibiting wide disparity in round-trip times. However, identifying and tracking a relatively small number of flows from mis-behaving sources is not desirable since it may require tracking a large number (tens to hundreds of thousands) of small sources as well.
For virus/worm detection applications, packet payload might be considered as a flow in order to detect virus/worm attacks in the network. Several packets with the same payload might indicate the start of a virus/worm spreading through the network. Common payloads, such as those containing the addresses of popular web sites, should not trigger an alarm, while polymorphic worms, that have similar but not identical payloads, should be identified. Measuring packets having the same or similar payload might allow such discrimination between desirable and undesirable packet payloads.
Other applications include tracking of flows that consume excessive memory or processing resources (“heavy hitters”). One method of identifying and tracking these heavy hitters samples packets of the flows with an assumed probability density, and, if the flow to which the packet belongs is not already in memory, then the flow is added to the memory. From that point on, all packets arriving at the node and belonging to this flow are counted. Since every packet is counted, the sampled flows are kept in a hash table and, at every packet arrival, the packet flow id is hashed into the hash table in order to increment the appropriate counter. Therefore, there is increased processing at each packet arrival compared to random sampling, but the method is relatively easy to implement since the size of the memory is reduced.
However, most methods of the prior art for flow estimation still require large sample sizes, with corresponding large memory requirements. In addition, such processing might require considerable processing resources and considerable time to complete.