1. Field of the Invention
This invention relates to systems and methods for monitoring system calls in a computer process, and more particularly to the use of data mining techniques to detect intrusions in such computer processes.
2. Background
Intrusion Detection Systems (IDS) are becoming an important part of computer security systems. A major advantage of IDS is the ability to detect new and unknown attacks by examining audit data collected from a system. Typically this detection is performed through a data mining technique called anomaly detection. Anomaly detection builds models of “normal” audit data (or data containing no intrusions) and detects anomalies based on detecting deviations from this normal model. The performance of these models depends greatly on the robustness of the modeling method and the quantity and quality of the available training data. Much of this data is sequential in nature. The basic units of the modeling technique are short contiguous subsequences obtained with a sliding window.
System call traces are a common type of audit data collected for performing intrusion detection. A system call trace is the ordered sequence of system calls that a process performs during its execution. The trace for a given process can be collected using system utilities such as strace. System call traces are useful for detecting a user to root (“U2R”) exploit or attack. In this type of exploit, a user exploits a bug in a privileged process (a process running as root) using a buffer overflow to create a root shell. Typically, the system call trace for a program process which is being exploited is drastically different from the program process under normal conditions. This is because the buffer overflow and the execution of a root shell typically call a very different set of system calls than the normal execution of the program. Because of these differences, it is possible to detect when a process is being exploited by examining the system calls. Other types of audit data that can be analyzed are any system of sequential symbols or operations, such as application call traces or machine code instructions.
Typically, prior art methods build models over short contiguous subsequences of the system call trace. These short continuous subsequences are extracted with a sliding window, which refers to the number of system calls being analyzed. Traditionally, system call modeling methods employ a fixed window size, i.e., a fixed number of system calls are analyzed. There have been many different methods proposed for building models over these short contiguous subsequences. Approaches for modeling normal sequences using look ahead pairs (S. Forrest, S. A. Hofmeyr, A. Somayaji, and T A. Longstaff, “A Sense of Self for Unix Processes.” Proceedings of the 1996 IEEE Symposium on Security and Privacy, pp. 120–128, IEEE Computer Society, 1996) and contiguous sequences (S. A. Hofmeyr, S. Forrest, and A. Somayaji, “Intrusion Detect Using Sequences of System Calls,” Journal of Computer Security, 6:151–180, 1998) are described in the prior art. A statistical method to determine sequences which occur more frequently in intrusion data as opposed to normal data is described in P Helman and J. Bhangoo, “A Statistically Based System for Prioritizing Information Exploration Under Uncertainty,” IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 27:449–466, 1997. A prediction model trained by a decision tree applied over the normal data is described in W Lee, S. J. Stolfo, and P K. Chan, “Learning Patterns from Unix Processes Execution Traces for Intrusion Detection,” Proceedings of the AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management, pp. 50–56. Memo Park, Calif.: AAAI Press, 1997, and W Lee and S. J. Stolfo, “Data Mining Approaches for Intrusion Detection, In Proceedings of the Seventh USENIX Security Symposium, 1998. Ghosh and Schwartzbard describe neural networks to model normal data (A. Ghosh and A. Schwartzbard, “A Study in Using Neural Networks for Anomaly and Misuse Detection, In Proceedings of the Eighth USENLY Security Symposium, 1999). Ye describes a Markov chain-based method to model the normal data (N. Ye, “A Markov Chain Model of Temporal Behavior for Anomaly Detection,” In Proceedings of the 2000 IEEE Systems, Man, and Cybernetics Information Assurance and Security Workshop, 2000).
Each of these methods attempt to predict whether a subsequence is more likely to have been generated by a normal process. Typically, the only data that is available is normal data, so this corresponds to predicting how likely an observed sequence is normal or is consistent with the normal data. One way to do this is to use a “prediction” model. For a sequence of length n, such a model computes how likely the first n−1 system calls predict the nth system call. The more consistent the subsequence is with the normal data, then the more accurate the prediction.
A disadvantage of all of the above methods is that they use a fixed window size for building the system call trace models, or models of other sequences of operations. The size of the window is picked a priori presumably based upon a determination of what size works best for the modeling. There is a tradeoff between using shorter or longer sequences. To analyze this tradeoff, Σ is considered the set of all distinct symbols, in which each symbol represents a distinct operation in a sequence of operations. For example, if the sequence of operations is a sequence of operating system calls made by a program, then the name of the operating system may serve as a distinct symbol. Assuming all sequences occur with equal probability and that there are |Σ| different operations, a specific n length sequence will occur with probability
      1                          Σ                    n        .In general, if longer sequences are used, there are significantly fewer instances of each subsequence in the data. However, these instances are more accurate than short sequences. Shorter sequences occur much more frequently, but often are not as accurate as longer sequences. Motivated by this tradeoff there is some optimal sequence length for the models. In related work, Marceau (as described in C. Marceau, “Characterizing the Behavior of a Program Using Multiple-Length n-Grams,” In Proceedings of the New Security Paradigms Workshop 2000) identifies the problems of determining a fixed window size and avoids the problem by presenting a model of using multiple sequence lengths for building these kinds of models.
However, this filter lacks the ability to define optimal sequence lengths that are determined by a data analysis of the available training data.
Accordingly, there exists a need in the art for a technique which is not limited to a fixed window size for analyzing sequential behavior and which provides the ability to detect intrusions in the operation of the computer system.