Malware, short for “malicious software,” is software that can be used to disrupt computer operations, damage data, gather sensitive information, or gain access to private computer systems without the user's knowledge or consent. Examples of such malware include software viruses, trojan horses, rootkits, ransomware etc. A common mechanism used by malware developers is to embed the malware into a file that is made to appear desirable to user, or is downloaded and executed when the user visits a web site. For example, malware may be embedded into a software application that appears legitimate and useful. The user downloads the file, and when the file is opened, the malware within the file is executed.
In the face of the growing threat of malware, many anti-malware software packages were developed to detect malware in a user's files. Upon detection, the anti-malware software may notify the user of the presence of the malware, and may automatically remove or quarantine the malware. In order to detect malware, anti-malware software vendors identify malware in files using signatures or behavior of the files. The signatures can be provided to client software that detects malware on end-user machines. In some cases however, files may be mislabeled. For example, a file may be labeled as malware when in fact it does not contain malware (i.e. a false positive). Alternatively, a file may be labeled as clean when in fact it contains malware (i.e. a false negative). Further, the file may be labeled as having a first type of malware when in fact it has a second type of malware.
The mislabeling of files can have serious consequences. For example, a file that is mislabeled as malware can cause a user to remove an otherwise useful application and interfere with a user's work flow. A file that is mislabeled as clean can cause a user's computer to become infected by the malware. In either case, the mislabeling can have a serious impact on the reputation of the anti-malware software provider.