Malware-based cyber attacks are becoming more prevalent. The malware used in such attacks are increasing in sophistication and focusing on clandestine operation and evasion of traditional defenses. Meanwhile, computing infrastructure continues to grow in both size and complexity, illustrated by recent trends such as the movement toward cloud computing and the emergence of ultra-large-scale systems (ULS). The complex computing systems employed by governments, corporations, and other institutions are frequently targeted by cyber-attacks designed for espionage and sabotage. The malicious software (malware) used in such attacks are typically custom designed or obfuscated to avoid detection by traditional antivirus software. Traditional malware defenses mechanisms primarily work by preventing potential malware from passing through a network or executing on a host computer. Traditional malware defenses include antivirus software, which use static signature-based detection techniques to identify potential malware. Popular due to their low false-alarm rates and ease of use, antivirus software requires new malware samples to be discovered and analyzed before they can be detected, leaving hosts vulnerable to new malware during the time period between the sample first being used in a cyber-attack and the creation of detection signatures for that sample. The ULSs used by governments, corporations, and other institutions, are particularly vulnerable to new malware, since these systems are constantly subject to cyber-attacks and their size and complexity complicate detection.
Antivirus software also may be used to detect obfuscated variants of known malware. Obfuscations may be applied to malware using specialized software that reorders, encrypts, compresses, recompiles, or otherwise changes the code without altering its function. Obfuscations also may be applied automatically and incrementally, as is the case with metamorphic and polymorphic malware that mutate as they propagate. The popularity of obfuscating malware to evade detection may be increasing because the engineering effort required to design new malware exceeds the effort to obfuscate existing malware. Accordingly, some new antivirus detection signatures may not be created for new malware, but rather for obfuscated variants of known malware.
The traditional defense against malware-based attacks has been the use of signature-based anti-virus (AV) software. Such software have been demonstrated to be vulnerable to simple obfuscations, leading malware authors to become increasingly adept at obfuscating their malware and evading detection. Furthermore, as malware become more focused on clandestine operation, they become increasingly difficult to detect after they evade AV software.
Behavior-based malware detection has been widely studied, using a variety of models and methods. Early work in malware and intrusion detection used system call patterns to identify anomalous behavior. Extensions to this work include efforts to use the arguments of system calls as features, machine learning algorithms for detection, and high-level models of system activity to improve detection.
Additional proposed methods of behavioral malware detection include taint analysis, a technique specifically intended for privacy-breaching malware. Also used are model checking, machine learning using performance monitors, computational geometry-based anomaly detection, and semantics that describe malicious behaviors. In the related field of network intrusion detection systems (NIDS), sequential change-point detection techniques have been applied to data collected from networks to identify intrusions and to detect denial of service (DoS) attacks.
A recent paper by Canali et al. entitled “A quantitative study of accuracy in system call-based malware detection,” in Proceedings of the 2012 International Symposium on Software Testing and Analysis, ser. ISSTA 2012. New York, N.Y., USA: ACM, 2012, pp. 122-132, studies the performance of various behavioral malware detection models, including n-gram, tuple, and bag of words models, on system calls and system behaviors. The two major contributions of Canali are that model selectivity and specificity do not necessarily improve detection performance, and that extensive empirical evaluation is required to establish the usefulness of behavioral malware detection techniques.