1. Field of the Invention
The present invention relates generally to intrusion detection, and more particularly, to systems and methods for implementing real-time sequence-based anomaly detection.
2. Discussion of the Related Art
An organization""s network security plan typically includes various lines of defense. Firewall systems represent a common first line of defense. Firewalls generally represent a security enforcement point that separates a trusted network from an untrusted network. In operation, firewalls screen all connections by determining which traffic should be allowed and which traffic should be disallowed based on a predetermined security policy.
Inevitably, this first line of defense will fail. The second line of defense of the network security plan is intrusion detection. After an intruder has successfully penetrated the perimeter and gained access to systems within the protected network, a mechanism must exist to quickly detect the intruder and minimize the damage that the intruder can inflict.
Intrusion detection is based on the general assumption that the behavior of the intruder differs from that of a legitimate user in ways that can be quantified. The variance between the respective behaviors will presumably increase where the intruder is operating in a manner calculated to produce increasingly malicious results. In general, the characterizations of the typical behavior of authorized users as compared to the characterizations of the behavior of intruders will not be mutually exclusive. It is therefore a goal of intrusion detection system designers to minimize so called xe2x80x9cfalse positives,xe2x80x9d i.e., characterizations of legitimate behavior as that of an intruder.
Intrusion detection systems that monitor for intrusive behavior need to collect data on the dynamic state of the system. Various types of dynamic state information can be collected. For example, some intrusion detection systems collect profiles of user behavior that are generated by audit logs. Other systems look at network traffic or attempt to characterize the behavior of privileged processes.
Once a behavioral characteristic is selected, it is used to classify data. Classification techniques can be divided into two categories. A first category is represented by techniques that look for known intrusion signatures. This category of techniques, known as misuse intrusion detection techniques, encodes intrusion signatures or scenarios and scans for occurrences of those signatures. Accordingly, these techniques require prior knowledge of the nature of the intrusion. In one example, misuse intrusion detection systems use an expert system to fit data to known intrusion signatures.
A second category of classification techniques is represented by those techniques that look for anomalous behavior. In anomaly intrusion detection, it is assumed that the nature of the intrusion is unknown, but that the intrusion will result in behavior different from that normally seen in the system. Anomaly intrusion detection systems use models of normal or expected behavior to monitor systems. These models form the basis of the determination of whether observed behavior deviates substantially from what is expected.
One example of an anomaly intrusion detection system is described in S. A. Hofmeyr, S. Forrest, A. Somayaji, xe2x80x9cIntrusion Detection Using Sequences of System Calls,xe2x80x9d Journal of Computer Security, 6:151-180, 1998, which is incorporated herein by reference in its entirety. In the Hofmeyr approach, a program is viewed as a black box, and therefore no specialized knowledge of the internal functioning of the program is required. The internal functioning of the program is inferred indirectly through the observation of the program""s normal behavior.
The behavior of the program is observed through the monitoring of system calls that access system resources. In the Hofmeyr system, short sequences of system calls were assumed to represent a good simple discriminator for several types of intrusions. Accordingly, in a first stage, profiles of observed sequences of system calls that occur in the conventional and acceptable operation of the program are recorded. These stored profiles form the basis by which the intrusion detection system can determine whether the system calls of a monitored process conform to the profiles of expected behavior. Deviations from the stored profiles would indicate an anomalous operation of the program.
To generate a database of profiles, the stream of system calls generated by a particular program is traced to identify all unique sequences of a given length, k, that occurred during the trace. Each program of interest has a different database that is specific to a particular architecture, software version and configuration, local administrative policies, and usage patterns. The following example illustrates the construction of a database.
Suppose the system observes the following trace of system calls (excluding parameters):
open, read, mmap, mmap, open, read, mmap
A window of size k is moved across the trace, recording each unique sequence of length k that is encountered. For example, if k=3, then the following unique sequences result:
open, read, mmap
read, mmap, mmap
mmap, mmap, open
mmap, open, read
Hofmeyr stores these sequences as trees, with each tree rooted at a particular system call. The set of trees corresponding to the above example is illustrated in FIG. 1. As illustrated, each system call tree is rooted at the first system call in the sequence. For example, the open-read-mmap tree is rooted at the open system call. As further illustrated, the sequences mmap-read-open and mmap-open-read are rooted at the same system call. This combination of sequences reduces the total number of nodes in the forest of trees.
Notwithstanding the savings gained through the reduction in nodes, this conventional storage mechanism still places a significant burden on the intrusion detection system. For example, consider a database that contains 1318 unique system call sequences of length ten. If each of these 1318 unique system call sequences are stored independently, 13180 nodes would be required, wherein each node corresponds to a system call. Storing the 1318 sequences as trees as per an example in the Hofmeyr reference, however, can reduce the number of nodes to 7578 nodes. The storage of the sequences as trees increases the storage efficiency as compared to storage of the 1318 unique system call sequences independently. Notwithstanding the gain in efficiency relative to the independent system call sequence storage, the storage of 7578 nodes still represents a significant storage burden. This inefficiency has a great impact on the ability of the intrusion detection system to operate effectively in real time. Accordingly, what is needed is a mechanism for increasing the operational efficiency of a sequence-based anomaly intrusion detection system.
The present invention meets the aforementioned needs by providing an intrusion detection system that operates efficiently in real-time. Computational efficiency is generated through the representation of known sequences of system calls in a distance matrix. The distance matrix indirectly specifies known sequences by specifying allowable separation distances between pairs of system calls. The distance matrix is used to determine whether a sequence of system calls in an event window represents an anomaly. Anomalies that are detected are further analyzed through levenshtein distance calculations that also rely on the contents of the distance matrix. In a preferred embodiment, the intrusion detection system is incorporated as part of a system call software wrapper. It is a further feature of the present invention that event abstraction enables the intrusion detection system to apply generically across various computing platforms.