1. Field of the Invention
The present invention relates to anomaly detection in computer networks, and more particularly to a specification-based anomaly detection method for network intrusion detection.
2. Discussion of Related Art
Intrusion detection techniques can be broadly classified into misuse detection, anomaly detection and specification based approaches. Misuse detection, which detects known misuses accurately, is not effective against unknown attacks. Anomaly detection copes better with unknown attacks, but can generate false positives. Specification-based approaches can detect novel attacks, while maintaining a low degree of false alarms.
Misuse detection techniques detect attacks as instances of attack signatures. This approach can detect known attacks accurately. However, it is not effective against previously unseen attacks, as no signatures are available for such attacks.
Anomaly detection overcomes the limitation of misuse detection by focusing on normal system behaviors, rather than attack behaviors. In Anomaly detection, machine learning techniques are used to learn normal behavior by observing system operation during a training phase that is free of attacks. Subsequently, this learnt behavior is compared against observed system behavior during the detection phase, and any deviations are deemed to indicate attacks. Unfortunately, systems often exhibit legitimate but previously unseen behavior, which leads anomaly detection techniques to produce a high degree of false alarms. Moreover, the effectiveness of anomaly detection is affected greatly by what aspects (also called “features”) of the system behavior are learnt. The problem of selecting an appropriate set of features has proven to be a hard problem.
Many network intrusion detection systems reconstruct higher level interactions between end hosts and remote users, and identify anomalous or attack behaviors. Other approaches operate on the basis of packet header contents. The reconstructive approaches provide a way to define signatures based on the content of data exchanged in a reconstructed TCP session, whereas the packet-header techniques define signatures in terms of individual packets. The former class of approaches are more effective in detecting application layer attacks, whereas the latter class of techniques can provide better detection of attacks that do not result in valid TCP sessions (e.g., probing attacks) or valid requests at the application level.
Within the area of anomaly detection based approaches, data mining is concerned with the extraction of useful information from large volumes of data. Data mining techniques for intrusion detection rely on expert identification of useful features for network intrusion detection. For example, W. Lee and S. Stolfo, Data Mining Approaches for Intrusion Detection, USENIX Security Symposium, 1998, suggest the selection of a long list of features that include, among many others, the following: successful TCP connection, connection rejection, failure to receive SYN-ACK, spurious SYN-ACKs, duplicate ACK rate, wrong size rate, bytes sent in each direction, normal connection termination, half-closed connections, and failure to send all data packets.
The NATE (Network Analysis of Anomalous Traffic Events) system uses statistical clustering techniques to learn normal behavior patterns in network data. Training data is used in the formation of clusters, or groups, of similar data. During detection, data points that do not fall into some cluster are seen as anomalous. Clustering uses a similarity measure and, for network data, sampling techniques are also needed. NATE can detect most network probes and DOS attacks in the MIT Lincoln Labs data. The technique used by NATE is sensitive to the sampling methodology and distance measure used, so continuing research is involved in trying to develop more accurate methods. NATE uses sampling to select a small subset of packet data for training. Moreover, the information learnt by NATE requires checking by a human before it is used for detection.
The EMERALD system contains a statistical component called eStat. This statistical component maintains short and long-term distribution information for several types of “measures”, using a decay mechanism to age out less recent events. While the techniques do not need prior knowledge of attack activity, such knowledge is used in the choice of attributes that constitute measures and time ranges used for intensity measures.
EMERALD also has a component that combines signature and anomaly-based approaches called eBayes. EBayes uses a belief network to determine from a number of features whether the values of those features fits with some normal behavior (http, ftp, etc.), some predefined bad behavior (mailbomb, ipsweep, etc.), or neither of these (other).
Unlike signature or misuse based intrusion detection techniques, anomaly detection is capable of detecting novel attacks. However, the use of anomaly detection in practice is hampered by a high rate of false alarms. Specification-based techniques have been shown to produce a low rate of false alarms, but are not as effective as anomaly detection in detecting novel attacks, especially when it comes to network probing and denial-of-service attacks.
Therefore, a need exists for a system and method of specification-based anomaly detection for network intrusion detection.