1. Field of the Invention
This invention pertains in general to computer security and in particular to the identification of malware using intelligent hashes.
2. Description of the Related Art
There is a wide variety of malicious software (malware) that can attack modern computers. Malware threats include computer viruses, worms, Trojan horse programs, spyware, adware, crimeware, and phishing websites. Modem malware is often designed to provide financial gain to the attacker. For example, malware can surreptitiously capture important information such as logins, passwords, bank account identifiers, and credit card numbers. Similarly, the malware can provide hidden interfaces that allow the attacker to access and control the compromised computer.
Some security computer systems and software for counteracting malware operate by seeking to identify characteristics of malware that are used to determine whether an entity such as a computer file or a software application contains malware. A hash value, herein referred to as a “hash”, is a value generated by applying a transform such as a cryptographic hash function to an entity such as a malware program. A hash value forms a unique representation or “fingerprint” of the malware program which can then be used as a characteristic to identify the program. Common transforms for generating hashes include MD5 and SHA-1.
Cryptographic hash functions are sensitive to small changes in the data (i.e. polymorphisms). Therefore, two similar entities such as variants of the same malware program (i.e. polymorphic malware) may have very different hashes. In this way, hashes are specific to the entity they are generated from. This specificity often causes false negative identifications of polymorphic malware as the data used to generate a hash for one variant of a malware program may be subject to the polymorphisms.
In order to compensate for false negative identifications, several hashes of an entity are used in identifying whether or not the entity is malware. As the number of different malware entities a client can be exposed to continues to increase over time, the number of hashes used to determine whether an entity is malware has grown proportionally. Using a large set of hashes can create inefficiency in scanning an entity such as a software application or file to detect the presence of malware.
Accordingly, there is a need in the art for decreasing the number of hashes used to identify malware without compromising the ability to detect malware on the clients.