1. Field of the Invention
The invention is related to the field of communications, and in particular, to methods and apparatuses for detecting traffic patterns in a data network.
2. Statement of the Problem
Monitoring and detecting significant traffic patterns in a data network, such as the presence of persistent large flows or a sudden increase in network traffic due to the emergence of new flows, is important for network provisioning, management and security. Significant behaviors often imply events of interests on the data network, such as denial of service (DoS) attacks. Two significant behaviors detected on a network that are of interest to network operators are high traffic users (also known as heavy hitters) and significant traffic change users (also known as heavy changers). A high traffic user is a node whose traffic exceeds a predefined threshold. A significant traffic change user is a node whose change in traffic volume between two monitoring intervals exceeds a pre-defined threshold. A node may be herein referred to as a key, which is information which identifies a node or flow. A key may represent a source (internet protocol) IP address and/or port, a destination IP address and/or port, or combinations of source and destination IP addresses and/or ports, such as a five-tuple flow (source IP address, destination IP address, source port, destination port, and communication protocol).
For instance, a data flow that accounts for more than 10% of the total traffic of the data network, which is a high traffic user by data flow, may suggest a violation of a service agreement. On the other hand, a sudden increase of traffic volume flowing to a destination, which is a significant traffic change user by destination, may indicate either a hot spot, the beginning of a DoS attack, traffic rerouting due to link failures elsewhere, etc. The goal of significant key detection problems is to identify all significant keys (e.g., detecting keys which are high traffic users or significant traffic change users) and estimate their associated values with a low error rate while minimizing both memory usage and computational overhead.
As the internet and other data networks continue to grow in size and complexity, the increasing network bandwidth utilized poses challenges on monitoring significant keys in real time due to computational constraints and storage constraints. To identify any network flow that causes a significant amount of traffic or a significant traffic volume change, the system should scale up to at least 2104 keys2 (i.e., the number of possible five-tuple flows: source IP address (32 bits), source port (16 bits), destination IP address (32 bits), destination port (16 bits) and communication protocol (8 bits)). Keeping track of per-key values is typically infeasible for large data networks due to processing and memory requirements imposed by the amount of keys and associated data tracked.
There are several important requirements for monitoring and detecting significant patterns in real time for high bandwidth links. The per-packet monitoring update speed should be able to catch up with the link bandwidth even in the worst case when all packets are of the smallest possible size. Otherwise, monitoring is not performed in real time. The detection delay of significant patterns should be short enough such that important events like network attacks and link failures can be responded to before any serious damage to the network occurs. Further, the false positive rate and the false negative rate should be minimized. A false negative may miss an important event and thus delay a necessary reaction. On the other hand, a false positive may trigger unnecessary responses that waste resources.
Data monitoring algorithms based on efficient data structures have been proposed for high traffic user detection and traffic-volume queries. These algorithms allow monitoring of data network traffic without tracking data individually for each separate key. One such data monitoring algorithm uses parallel hash tables to identify large flows using a memory that is only a small constant larger than the number of large flows. However, this technique only detects high traffic users, and does not detect users having significant changes in traffic. Other proposed techniques have been proposed that detect both high traffic users and users having significant changes in traffic. However, these algorithms are not memory-efficient and/or computationally efficient for use in high traffic networks.