Antivirus and antispyware solutions generally employ traditional scan-based technologies to identify viruses, worms, Trojan horses, spyware, and other malware on an endpoint device. Typical antivirus and antispyware solutions may detect these threats by searching a system for files that match characteristics (e.g., malware signatures) of a known threat. Such security solutions may also avoid false positives by determining whether a file matches characteristics of a known-good file.
As the number of malware threats increase, the sizes of signature databases that identify malware threats and known-good files also increase. Another factor that contributes to the size of signature databases are legitimate variations in known-good files. For example, installers may rebind an executable file, thereby modifying the executable to include information about dependencies on a particular system. An executable file may also be rebased so that the executable file's base address does not interfere with other executable files on a system. Furthermore, MICROSOFT .NET native images typically vary among installations. Thus, a source executable file may correspond to numerous unique instances of the executable file.
As a result of variations in installed executables, a hash of a legitimate source executable file may not match an executable file that has been modified, possibly resulting in a false positive malware detection. Additional signatures may be added to a signature database to cover variations between executables, but many variations may be difficult to account for. Furthermore, large signature databases may be undesirable. For example, adding signatures to a signature database on a client device may result in an increased disk footprint and additional consumption of CPU cycles and memory during malware scans. Similarly, server-side lookups may take longer and consume more resources as server-side signature databases grow. Furthermore, the larger the database, the higher the likelihood of triggering false positive detections.
Some traditional security solutions may attempt to account for variations among executable files by zeroing out portions of an executable file that may vary among instances of the executable file. Unfortunately, malicious programmers may evade detection by placing malicious code in executable sections that are zeroed out. What is needed, therefore, is a more secure and effective mechanism for generating consistent hash files for varying instances of executable files.