The Internet has emerged as a critical communication infrastructure, carrying traffic for a wide range of important scientific, business and consumer applications. Network service providers and enterprise network operators need the ability to detect anomalous events in the network, for network management and monitoring, reliability, security and performance reasons. While some traffic anomalies are relatively benign and tolerable, others can be symptomatic of potentially serious problems such as performance bottlenecks due to flash crowds, network element failures, malicious activities such as denial of service attacks (DoS), and worm propagation. It is therefore very important to be able to detect traffic anomalies accurately and in near real-time, to enable timely initiation of appropriate mitigation steps.
One of the main challenges of detecting anomalies is the mere volume of traffic and measured statistics. Given today's traffic volume and link speeds, the input data stream can easily contain millions or more of concurrent flows, so it is often impossible or too expensive to maintain per-flow state. The diversity of network types further compounds the problem. Thus, it is infeasible to keep track of all the traffic components and inspect each packet individually for anomaly behavior.
Another major challenge for anomaly detection is that traffic anomalies often have very complicated structures: they are often hierarchical (i.e. they may occur at arbitrary aggregation levels like ranges of IP addresses and port numbers) and multidimensional (i.e. they can only be exposed when we examine traffic with specific combinations of IP address ranges, port numbers, and protocol). In order to identify such multidimensional hierarchical traffic anomalies, a naive approach would require examining all possible combinations of aggregates, which can be prohibitive even for just two dimensions. Another important challenge stems from the fact that existing change detection methods utilize usage measurements that are increasingly sampled.
Therefore, a need exists for a method and apparatus for near real-time detection of multidimensional hierarchical heavy hitters in packet-switched networks, (e.g., Voice over Internet Protocol (VoIP) networks), that can also accommodate sampling variability.