1. Field of the Invention
This invention pertains in general to computer security and in particular to detecting malware.
2. Description of the Related Art
Modern computer systems are often susceptible to a wide variety of security threats on the part of malicious software (“malware”) that secretly performs operations not desired by the computer user, such as data theft, file destruction, installation of backdoor programs, and the like. One common technique used by security software for malware detection involves the use of signatures, in which newly-discovered malware is analyzed and distinctive sequences of code (“signatures”) are extracted. Subsequently, security software examines code residing on a monitored machine to determine whether the code contains the malware signature; if it does, then the code is flagged as malware.
In order to evade signature-based approaches, some polymorphic malware uses various techniques to disguise itself. For example, some polymorphic malware performs post-processing operations to alter the code initially produced by a compiler or other code-generating utility so that it will not contain a consistent malware signature for security software to flag. One such technique is compression, in which the malware compresses its code, later using an included decompression module to decompress the code at runtime. Such compression may occur at various times, such as when the malware replicates itself from one computer system to another, or when it first executes on a given computer system. Another technique is obfuscation, which alters the code of the malware without necessarily compressing it, such as by inserting “no-operation” instructions at strategic locations. In either case, the malware code is changed by the post-processing, so that it is difficult to create a signature that will consistently identify the malware. The term “post-processed” program as used herein designates a program whose executable file has been substantively altered, e.g. by the compression or obfuscation techniques mentioned above, or by other file-altering techniques.
Some conventional techniques exist to detect post-processing of executable program files, but suffer from various shortcomings. For example, it is possible to disassemble code sections of an executable file and to analyze the code section to determine whether it contains any code sequences that are nonsensical for a given processor, such as machine language instructions that would never follow each other in sequence. The presence of such code sequences, which would not be output by a legitimate compiler or other code generation utility, indicates that the executable file was post-processed subsequent to its initial generation. As another example, it is possible to calculate the degree of “entropy”—variation in the values of the respective bytes—in the executable file, with low entropy indicating that the file was likely compressed to compact together sections of the code having the same value. However, both code sequence analysis and entropy calculations are computationally expensive. Additionally, entropy calculations, though capable of detecting code compression, cannot detect code obfuscation, which does not significantly alter the degree of entropy of an executable file.