Computer systems in today's computing environments are constantly facing attacks from malicious software (malware). Malware may include computer viruses, worms, trojan horses, rootkits, spyware, adware, crimeware, and other malicious software that may cause unwanted changes to executable files. These changes could materialize in different ways. In the simplest scenario, a change could be in the name of the executable file that any user can see with tools such as windows task manager. Other changes may include a change in the process behavior and in the behavior of modules upon which a process relies. Therefore, it is important to develop tools that help detect, mitigate, and/or eliminate the risks associated with malware.
Meanwhile, hashing is a process by which data is encoded by using an algorithm to produce a unique fixed-sized bit string (e.g., the hash) for each unique block of input data. For a hash algorithm (e.g., SHA1, MD5, and the like) to be useful, it must be computationally intractable to reverse engineer the block of data from the hash and a modification of the input data must also necessarily produce a change in the hash. Hashes are often used in information security and authentication applications, such as for the detection of malware that has infiltrated a computing system.
To utilize a hashing algorithm for authentication purposes, a security program may perform a hash of static executable files stored on the hard drive of a “clean” computing system. These hash values may then be compared to the hash values generated for potentially modified versions of the static executable files. If a difference exists between the hash values found on the clean system versus those found on the system under study, a red flag may be raised and appropriate action may be taken to shut down the system and/or purge the suspicious files.
Unfortunately, the comparison of hash values for files resident on the hard drive (e.g., static files at rest) has certain drawbacks. For instance, trivial modifications to the file would change the hash value and may unnecessarily cause risk mitigation countermeasures to be implemented. One common example of a trivial modification that some viruses cause in files in order to defeat whitelisting is the creation of a null byte at the end of a file. In such scenarios, a legitimate program would be prevented from running, thereby causing an undesirable disruption of service that could result in loss of time and money.
As another example of how malicious changes may be made to executable files, DLL (dynamic-link library) injection may force an unsuspecting running executable file to accept a DLL that has been tampered. Here, DLL code is injected directly into the memory space of the executable process, thereby causing the executable file to run compromised code. In general, DLL injection may change the behavior of the original DLL and, consequently, may change the behavior of any running process that relied on that DLL, similar to the changes oftentimes produced by a rootkit. Again, the conventional technique of comparing the hash values for files resident on the hard drive will not perform as desired because processes such as DLL injection occur only after the executable file has been loaded into memory.
The drawbacks of performing a traditional hashing algorithm on static files are only exacerbated by the fact that traditional hashes such as MD5 and SHA1 take the input data as a whole and produce a single fingerprint. Any trivial change (e.g., the creation of a null byte as discussed above) to the input data, which may not necessarily change process behavior associated with data, would create two completely different hash results. By evaluating the two completely different hash results, no determination may be made as to the location and/or extent of the changes to determine whether or not the file including the data may be properly whitelisted.