Accurate and fast flow measurement and characterization of a data network is an important component for network management, accounting and traffic engineering. For instance, network service providers may be interested in which flows from which customers consume most of their network resources during any given time period, and adjust their provisioning and pricing accordingly. Network operators may need to continuously monitor the traffic patterns of their network traffic to detect any suspicious changes in such traffic patterns. A sudden increase in traffic to a particular destination, for example, may indicate a possible Denial of Service (DoS) attack.
Known methods of flow measurement and packet sampling have attempted to solve the above-mentioned network monitoring issues. The proposed mechanisms typically use an explicit definition of flow. A common definition is to characterize a flow by a 5-tuple in the IP packet header, including source IP (src IP) address, source port (src Port), destination IP (dst IP) address, destination port (dst Port), and protocol ID (prot).
However, knowing what type of flow to capture or measure before actually conducting measurements is often very difficult, if not impossible. Any combination of fields in the 5-tuple of the IP packet may constitute a flow with an “interesting” traffic pattern, but this combination is not known a priori. In this sense, interesting traffic patterns are often hidden in traffic streams and efficient algorithms to uncover them in real-time are heretofore unknown.
As an example, an “interesting” flow for observation may not be the 5-tuple flow, but the flows defined by only certain sub-fields, such as destination address and port number, for example. Network operators often do not know what flows to look for until they actually observe statistics on various kinds of flows. Furthermore, measuring one particular type of flow may either lose or hide important information that can be derived by measuring other types of flows.
For example, measuring only detailed 5-tuple flows may not reveal a possible ongoing DoS attack, because such attack may consist of not one but many small 5-tuple flows. Similarly, measuring only aggregated flows based on sub-fields like destination address and port number, for example, may not reveal which source network uses most of the network bandwidth.
Another known system uses a traffic measurement algorithm that does not require a priori flow definition. Instead, this system sifts through traffic trace data and generates reports for multi-dimensional traffic clusters. The approach can capture any flow with a rate above a predefined threshold, regardless of flow dimensionality. Although this improves usability and convenience for network operators, this approach requires scanning of the trace multiple times and is essentially designed for off-line processing. The processing complexity and memory usage are not optimized for fast on-line measurement.
Thus, there is a need for a practical and real-time, on-line traffic or flow measurement approach that does not require a priori knowledge of flow definition.