Traffic anomalies such as failures and attacks are commonplace in today's network, and identifying them rapidly and accurately is critical for large network operators. The detection typically treats the traffic as a collection of flows that need to be examined for significant changes in traffic pattern (e.g., volume, number of connections). However, as link speeds and the number of flows increase, keeping per-flow state is either too expensive or too slow.
Traffic anomalies are an integral part of daily life for today's network operators. Some traffic anomalies are expected or unanticipated but tolerable. Others are often indications of performance bottlenecks due to flash crowds, network element failures, or malicious activities such as denial of service (DoS) attacks and worms. Suitable motivation exists to process massive data streams (available from diverse sources) quickly, in order to examine them for anomalous behavior. Two basic approaches to network anomaly detection are common.
The first approach is the “signature-based” approach, which detects traffic anomalies by looking for patterns that match signatures of known anomalies. For example, such techniques may infer DoS activities based on address uniformity, a property shared by several popular DoS toolkits. Signature-based methods have been extensively explored in the literature and many software systems and toolkits. One limitation of this approach is the requirement that the anomaly signatures be known in advance. Thus. it cannot be applied to identify new anomalies. Also, a malicious attacker can evade signature-based detection systems by altering their signatures. One can see a parallel in the failure of filter-based, spam-fighting systems where spammers introduce random hashes in their spam messages.
A second approach is the “statistics-based” approach, which does not require prior knowledge about the nature and properties of anomalies and therefore can be effective even for new anomalies or variants of existing anomalies. A very important component of the statistics-based approach is change detection. It detects traffic anomalies by deriving a model of normal behavior based on the past traffic history and looking for significant changes in short-term behavior (on the order of minutes to hours) that are inconsistent with the model.
Change detection has been extensively studied in the context of time series forecasting and outlier analysis. The standard techniques include different smoothing techniques (such as exponential smoothing or sliding window averaging), Box-Jenkins AutoRegressive Integrated Moving Average (ARIMA) modeling, and finally the more recent wavelet-based techniques.
Prior works have applied these techniques to network fault detection and intrusion detection. Examples in fault detection include: those that identify faults based on statistical deviations from normal traffic behavior; methods of identifying aberrant behavior by applying thresholds in time series models of network traffic; methods for intrusion detection including neural networks, Markov models, and clustering; and those that provide a characterization of different types of anomalies and propose wavelet-based methods for change detection.
Unfortunately, existing change detection techniques typically only handle a relatively small number of time series. While this may suffice for detecting changes in highly aggregated network traffic data (e.g., Simple Network Management Protocol (SNMP) link counts with a 5 minute sample interval), they cannot scale up to the needs at the network infrastructure (e.g., Internet Service Provider (ISP)) level. At an ISP level, traffic anomalies may be buried inside the aggregated traffic, mandating examination of the traffic at a much lower level of aggregation (e.g., Internet Protocol (IP) address level) in order to expose them.
Given today's traffic volume and link speeds, a suitable detection method has to be able to handle potentially several millions or more of concurrent network time series. Directly applying existing techniques on a per-flow basis cannot scale up to the needs of such massive data streams. Some recent research efforts have been directed towards developing scalable “heavy-hitter” detection techniques for accounting and anomaly detection purposes. However, heavy-hitter techniques do not necessarily correspond to flows experiencing significant changes and thus it is not clear how their techniques can be adapted to support change detection.
Accordingly, there is a need for an efficient, accurate, and scalable change detection mechanism for detecting significant changes in massive data streams with a large number of flows.