Antivirus, antispyware, and other anti-malware applications seek to protect client computers by identifying harmful applications or other executable code and removing or at least neutralizing the harmful code. Current anti-malware applications (e.g., Microsoft Windows Defender, Microsoft Forefront Client Security, Microsoft OneCare, Microsoft Forefront Server for Exchange Server, and so forth) use a signature-based approach to detect viruses, worms, and spyware. The signature-based approach relies on one or more distinguishing features of the malware to provide a positive identification so that the anti-malware application can remove it. For example, a particular malware application may have a certain file name, write a certain value to the operating system configuration database (e.g., the Microsoft Windows Registry), or contain executable code having certain bytes (e.g., identified using a CRC, cryptographic hash, or other signature algorithm).
The signature-based approach is heavily dependent on analysis of existing malware by skilled technicians and the quality of that analysis. Typically, a technician receives a sample of a new threat or a variant of an already known threat. For example, a user may email the threat in the form of one or more files to an email address for reporting malware. The technician then begins investigation. During the investigation, the technician may execute malware in a virtual environment, such as a sandboxed computer system that cannot cause harm to other computer systems even if the malware affects the computer system. If the malware sample successfully runs in the virtual environment and produces enough information, the technician analyzes the execution history, content of created/deleted files, registry keys, network activity, and other activity of the malware and creates detection signatures and removal instructions. For example, if the malware creates a file virus.exe in a particular directory, the signature may identify the file and the removal script may specify deleting the file at its typical location.
This type of analysis is problematic for several reasons. First, the process involves human analysis and thus is slow and bottlenecked by the available technicians to review new threats. The rate of new threats increases all the time, and there are often fewer available technicians than there are malware authors creating new malware. Second, the technician may not be able to successfully run the malware in the virtual environment, and thus may not be able to understand a complete model of how the malware behaves. This can lead to failure to detect the malware in some variants or incomplete removal of the malware, for example. Examples of reasons the malware might not run in the virtual environment are the malware detecting the domain of the technician as being an anti-malware vendor domain, the malware failing to run if the operating system version or an installed application is not the one expected by the malware author, and so forth. Sometimes the sample received from the customer computer might contain insufficient information about the threat. For example, the report may only include a driver and a few other files without enough information for the technician to understand how to run the malware. This case usually ends with incomplete detection/removal that affects the customer experience.
Often, client computers are infected when a user visits a website and allows installation of “unknown software” (typically, users believe they are installing good software). The original URL that users visited often does not contain the binaries, but rather redirects to another “short-life cycle” uniform resource locator (URL). Once a technician receives the original URL and begins investigation, the “short-life cycle” URL may no longer contain the malware. Technicians may never receive enough information to catch the actual culprit or detect the malware in its earliest form. Further, some samples may evade analysis for a while because technicians or early threat analysis prioritizes them low. For example, the threat may have received a limited numbers of reports, which may only be because the anti-malware does not inherently detect the threat yet (i.e., a false negative).
Another problem is misidentification of a new variant of previously identified malware in which the removal script produced by the technician does not clean the new variant. This can come from slowness to update the anti-malware application with the latest information about recent changes in malware families. The technician may also not receive a complete picture of the malware been analyzed because the malware expects a special environment or combination of user actions that are not obvious or present in the technician's environment. For example, before the malware executes it may expect the user to visit a particular web site. After the user visits the website, the malware may replace the user's security certificate with one belonging to the malware that redirects the user's web browser traffic through a spyware web site and monitors the user's browsing habits. If the technician never visits the website, then the malware will not produce enough information for the technician to understand the offending behavior of the malware.