Malware, short for “malicious software,” is software that can be used to disrupt computer operations, damage data, gather sensitive information, or gain access to private computer systems without the user's knowledge or consent. Examples of such malware include software viruses, Trojan horses, rootkits, ransomware, etc. A common mechanism used by malware developers is to embed the malware into a file that is made to appear desirable to user, or is downloaded and executed when the user visits a web site. For example, malware may be embedded into a software application that appears legitimate and useful. The user downloads the file, and when the file is opened, the malware within the file is executed. A file that contains malware can be referred to as a malicious file.
In the face of the growing threat of malware, many anti-malware software packages were developed to detect malware in a user's files. Upon detection, the anti-malware software may notify the user of the presence of the malware, and may automatically remove or quarantine the malware. Detecting malware can be a difficult task, because millions of new files are created every day.
In order to avoid detection by anti-malware software, sophisticated malware developers introduced polymorphism into their malware. Polymorphic malware refers to malware in which portions of the malware are automatically changed without changing the overall functioning of the malware. The polymorphic nature of the malware results in executable files in the same malware family that perform very similar operations or dump the same payload when executed in the target machine, but have different byte content. Thus, polymorphism makes it difficult to use the file contents directly to recognize a malware family or group the files on similarity.