1. Field of the Invention
Embodiments of the present invention relate to methods for detecting malicious software. More particularly, embodiments of the present invention relate to methods for detecting malicious software based on properties of the programs that host the malicious software.
2. Background Information
Malicious software (“malware”) remains a major threat to today's information systems. Examples of malware include but are not limited to viruses, Trojan horses, and worms. Detecting and analyzing dangerous programs is a costly and often inaccurate endeavor. The difficulty of this task is underscored by a recent contest challenging participants to figure out the nefarious behavior of a particular program that has already been determined to be malicious in nature. Often identifying a program (or portion thereof) as malicious is half of the battle.
An important area of investigation is the detection of malicious software that has been attached to an otherwise benign host application. This is the modus operandi for many of the most common forms of malware including executable viruses and many Trojan horse programs. The host program provides cover while the virus or Trojan horse performs malicious actions unbeknownst to the user. These programs often propagate while attached to games or other enticing executable files.
Malicious programmers have demonstrated their creativity by developing a great number of techniques through which malware can be attached to a benign host. Several insertion methods are common, including appending new sections to an executable, appending the malicious code to the last section of the host, or finding an unused region of bytes within the host and writing the malicious content there. A less elegant but effective insertion method is to simply overwrite parts of the host application.
Given the myriad ways malicious software can attach to a benign host it is often a time-consuming process to even locate the point of infection. Traditional tools including disassemblers and debuggers may be useful for examining malware once it has been located, but provide little help in guiding an analyst to the malicious software in the first place. Malicious software that hides in a data section or other unexpected location may be particularly difficult to identify. To make matters worse, the total code size of a malicious program is frequently orders of magnitude smaller than the host that it infects.
Also, malicious software detection is theoretically unsolvable. This has much to do with the subtlety of what constitutes malicious code and what constitutes an “honest bug.” For example, a programmer may inadvertently code a program that contains a buffer over-run. This is an “honest bug” due to a programming error. A different user may construct the exact same source code knowing full well that there is a buffer over-run. This same user may later exploit the buffer overflow to gain unauthorized access to systems. Thus, an algorithm to decide maliciousness cannot be developed for the most general case.
It has been proven that deciding whether or not an arbitrary program is infected with an arbitrary virus is “Turing Undecidable.” This result is intimately related to the “Halting Problem in Computability Theory” that states that there does not exist a Turing Machine that can decide whether or not an arbitrary Turing Machine will halt on all inputs or not. The proof of this is utilizes a Cantor diagonalization argument.
In view of the foregoing, it can be appreciated that a substantial need exists for methods that can advantageously aid an analyst in determining if a program contains malicious code.