1. Field of the Invention
This invention pertains in general to computer security and in particular to the development of signatures to accurately identify malware.
2. Description of the Related Art
There is a wide variety of malicious software (malware) that can attack modern computers. Malware threats include computer viruses, worms, Trojan horse programs, spyware, adware, crimeware, and phishing websites. Modern malware is often designed to provide financial gain to the attacker. For example, malware can surreptitiously capture important information such as logins, passwords, bank account identifiers, and credit card numbers. Similarly, the malware can provide hidden interfaces that allow the attacker to access and control the compromised computer.
Security computer systems and software for counteracting malware typically operate by seeking to identify malware signatures. Malware signatures contain data describing characteristics of known malware and are used to determine whether an entity such as a computer file or a software application contains malware. Typically, a set of malware signatures is generated by a provider of security software and is deployed to security software on a user's computer. This set of malware signatures is then used by the security software to scan the user's computer for malware.
During malware signature generation, malware signatures are typically validated against entities that are known to not contain malware, herein referred to as “goodware”, in order to ensure that the malware signatures do not generate false positive identifications of malware. In other words, the malware signatures are validated to ensure they do not falsely identify goodware as malware. Typically, a malware signature is first generated by a security administrator then compared to a dataset of goodware in order to determine whether the malware signature generates false positive identifications of malware. Due to the large size of the dataset of all known goodware and the large number of potential malware signatures, comparing malware signatures to a dataset of goodware may be very computationally expensive.
An alternate method of generating malware signatures involves determining the rate at which characteristics occur in the dataset of all goodware. Using this method, malware signatures containing characteristics which have a low rate of occurrence or no occurrence in a dataset of goodware may be generated. However, due to the large set of all possible goodware and the large set of all possible characteristics, it is not tractable to identify and store a rate of occurrence for all characteristics.
Accordingly, there is a need in the art for methods of developing malware signatures with reduced false positive malware detections.