1. Field of the Invention
The present invention relates to the field of computer software, and, in particular, the invention provides a system and method by which malicious or failing software may be detected before such software can cause significant damage.
2. Background of the Invention
Host-based intrusion detection systems monitor and analyze user and software audit data captured on a host machine. Captured data can include user data such as keystrokes, login/logout times, operational profiles, or programs run during a session. Captured data can also include program behavior data such as system calls or internal program states of monitored programs.
There are two general types of intrusion detection algorithms used in host-based intrusion detection: misuse detection and anomaly detection. Misuse detection systems are those that fall into the category of “signature based” systems. That is, they work by comparing data captured online with signatures of known attacks stored in a database.
Advantages of misuse detection include a high certainty rate in detecting well-known attacks, and a low false positive rate. A major misuse detection system drawback, however, is that misuse detection systems cannot detect novel attacks, or, in some cases, even slight variants of well-known attacks. As a result, misuse detection systems are completely reactive to computer misuse, and must be updated often to be able to detect the latest well-known attacks against systems.
Unlike misuse detection, anomaly detection is designed to directly address the problem of detecting new or novel system attacks. To accomplish this, anomaly detection techniques do not scan for specific patterns, but rather compare current activities to statistical models of past behavior, often using what is termed an equality matching approach. The equality matching approach flags as anomalous any activity that sufficiently deviates from the statistical model, thereby allowing such activity to be evaluated as a possible threat.
Although anomaly detection is powerful in that it can detect novel attacks, anomaly detection also has its drawbacks. One of the most significant disadvantages of anomaly detection systems is the comparatively high rate of false alarms. Any significant baseline deviation can be flagged as an intrusion, thus any non-intrusive behavior that falls outside the normal range can also be labeled as an intrusion—resulting in a false positive. In spite of the potential anomaly detection drawbacks, the ability to detect novel attacks makes anomaly detection a requisite if future, unknown computer system attacks are to be detected.
Due to these advantages, anomaly detection is increasingly undergoing research. This research has primarily focused on program-based intrusion detection. The premise of program-based intrusion detection is that there are certain behaviors that are characteristic (or uncharacteristic) of individual programs. Such behaviors can include file input/output (I/O), requests for more memory, or any other action that requires computer resources. For example, the behavior of the program tar—which normally is non-interactive and affects files—is markedly different from that of the program lynx—which normally is very interactive but does not affect files. When viewed from this perspective, almost every program has some externally visible behavior.
In addition to differences in user-level functionality, programs also exhibit distinguishing characteristics and traits at the system level. For example, the interaction between an operating system and a program that performs complex computations will be different from those between the same operating system and a program that performs extensive disk I/O. System-level monitoring also tends to capture user-level interactions, as all interactions between a program and a user go through the operating system at some level. Thus system-level monitoring can capture user interactions in addition to actions that are normally transparent to the user (such as requesting more memory), thereby creating an accurate process behavior profile.
System-level monitoring can be achieved through various techniques, including adding instrumentation to programs which monitors and records internal program states, or adding instrumentation to an operating system to monitor and record external system calls made by a program. In general, it is more attractive to use auditing capabilities provided by the operating system, because this does not require instrumentation of every application installed on a computer. In addition, profiles can be directly created for all software, including commercial off-the-shelf (COTS) software. Examples of such auditing systems include Strace, for various UNIX-based operating systems, which allows monitoring of both system calls made by a given process and values returned by such system calls, and the Basic Security Module (BSM) implemented the Sun Microsystems' Solaris operating system, which can recognize and log the use of 243 built-in system calls that can be made by an individual process. By logging the use of such hooks, Unix and Solaris programmers can profile observable process behavior.
A normal process behavior profile is built by first implementing a “training phase.” During a training phase, the intrusion detection system (IDS) records audit information generated by a program under “normal usage.” Once an IDS has created this profile, subsequent program behavior can be compared to the profile; if a deviation is noted, then an intrusion flag is raised.
Some research has been conducted which indicates that system-level, program-based anomaly detection utilizing an equality matching approach is a viable option for recognizing malicious behavior as anomalous under Unix. This approach consists of condensing an audit trace into a series of discrete events, and then memorizing every sequence of n events (for some n) that is seen in the training data. Later, a process is flagged as being potentially intrusive if its execution trace contains one or more sequences of length n that do not appear in the database of memorized sequences. Despite the simplicity of the approach and the high detection levels, it does not record the context in which a sequence of system calls occurred, and this can be a disadvantage. A given sequence of system calls may indicate an intrusion in one context, and yet be perfectly normal in another context. When an equality matcher encounters such a sequence in its training data, there is a drop in its performance. Overall, past research indicates that an equality matcher only converges slowly to a good solution as increasing amounts of training data are supplied.