Malware, short for “malicious software,” is software that can be used to disrupt computer operations, damage data, gather sensitive information, or gain access to private computer systems without the user's knowledge or consent. Examples of such malware include software viruses, trojan horses, rootkits, ransomware, etc. A common mechanism used by malware developers is to embed the malware into a file that is made to appear desirable to user, or is downloaded and executed when the user visits a web site. For example, malware may be embedded into a software application that appears legitimate and useful. The user downloads the file, and when the file is opened, the malware within the file is executed. A file that contains malware can be referred to as a malicious file.
Detection of malware in order to protect computing devices is of major concern. Recently, there have been many attempts to improve the detection of malware. One such attempt involves determining whether one file is similar to another file or if one data object is similar to another data object. For example, signature analysis, heuristic analysis, behavioral analysis, hash sum analysis, and cloud-based analysis are categories of such methodological approaches. While the signature and hash sum techniques are well-known methods of detection analysis, these techniques can fail to detect modified malware code. Heuristic analysis can attempt to generally detect new malware by statically analyzing files, but it can be ineffective in detecting obfuscated malware. Behavioral analysis often proves effective in detecting modified malware, but even known methods of this analysis have a number of shortcomings. For example, known behavioral analysis methods may cause reduced performance of the system. For these reasons, a need for an improved method for detecting malware, particularly without reducing system performance, exists.