Software exists to detect (and thus eliminate) malware (e.g., viruses, worms, Trojan horses, spyware, etc.). Such software typically works by using either static bit signatures and/or heuristics to identify malware. Static bit signature based malware detection involves identifying a specific bit-level pattern (signature) in known malware. Files are then scanned to determine whether they contain this signature. When malware is identified using static file signatures, the certainty of the conviction is high. However, signature based detection is easily circumvented by changing content. Signatures have become less and less useful, as malware authors have become more sophisticated at manipulating their malware to avoid signature based detection. For example, malware authors now commonly use techniques such as obfuscation and packing of files to change the contents without affecting the malicious functionality. Many malware authors have learned to obfuscate and/or pack the bits of their malicious files beyond the point at which signature based file scanning can effectively detect them. Once a signature is identified for an iteration of a given piece of malware, the file containing the malware is modified such that the signature can no longer identify it. As a result, a large number of ineffective malware signatures are in use.
Heuristic malware detection involves determining the likelihood of a given file being malware by applying various decision-based rules or weighing methods. Heuristic analysis can produce a useful result in many circumstances, but there is no mathematical proof of its correctness. In static file heuristics, the contents of the file is heuristically analyzed. In behavior based heuristics, the behavior of the program is heuristically analyzed. Both methods involve training a heuristic analyzer with a sample set of malware and clean files, so that it can make generalizations about the types of content or behaviors associated with each. Identifications of suspected malware using heuristic analysis can never, by definition, be highly certain, as heuristic analysis only determines a likelihood of a file being clean or malicious. The confidence in heuristic based file convictions further suffers from the fact the training set is difficult to define, and is always different than the real world set.
It would be desirable to address these issues.