E-mail based worms and viruses, sometimes referred to as malware, may infect large numbers of hosts rapidly. E-mail malware can propagate as executable attachments that users are tricked into opening, thus potentially causing the malignant code to run and propagate. One way the propagation can occur, for example, is by the attacking code sending copies of itself to entries in the users' e-mail address books. While e-mail attachments are not the only vector by which malware propagates, they pose a substantial threat that merits special treatment, especially since attachments can be caught before they hit a user's machine. There are various approaches to defending against malicious software, for example, employing virus scanners to detect viruses.
Virus scanners are largely signature-based and typically identify security threats by scanning files for certain byte sequences that match already-known patterns of malicious code. Therefore, the scanners require an up-to-date signature database to be maintained. Maintaining such a database can be a difficult and resource-intensive problem. This problem can be exacerbated by the lag in the cycle of detecting a new attack and the deployment of a corresponding signature, especially when humans are involved in the process. Further complicating the situation is that many e-mail born viruses do not rely on software bugs. Instead, they rely on humans to click on the attachments, thus activating them. Thus, the need for frequent updates and the inherent delay between the creation of malicious software, and the detection and deployment of signatures or patches relegate signature-based techniques to a secondary role in the active security of systems.
Another approach, the use of behavior-based mechanisms, characterizes software based on the perceived effects that the software has on an examined system instead of relying on distinct signatures of that software. A benefit of this approach is that it can detect previously unseen attacks, that is, attacks for which the system has no prior knowledge or signatures. These attacks can be detected as long as there is some differentiation between the behavior of the attacking software and that of normal software. Many of these behavior-based systems rely on anomaly detection algorithms for their classification, and thus detection, of malignant code.
Anomaly-detection algorithms work by constructing models of normal behavior and subsequently checking observed behavior against these models for statistically significant variations that may hint at malicious behavior. The success of an anomaly detection algorithm can depend on the choice of an accurate behavior model. Host-based intrusion detection systems typically employ anomaly detection algorithms that are based on network activity, system call, and file system monitoring.
One negative aspect of host-based intrusion detection systems (IDS) is that the computational overhead associated with extracting behavior models from irregular and high-volume events may tax the processing power of the host. For example, analyzing all system calls in a system may impose considerable overhead due to the volume of events. Correlating this with the generally irregular nature of system calls imposes a considerable computational overhead. False positive rates may pose a further disadvantage.
Accordingly, it is desirable to provide systems and methods that overcome these and other deficiencies of prior systems.