Malicious software, or malware for short, may include any program or file that is harmful by design to a computer. Malware includes computer viruses, worms, Trojan horses, adware, spyware, and any programming that gathers information about a computer or its user or otherwise operates without permission. The owners of the computers are often unaware that these programs have been added to their computers and are often similarly unaware of their function.
Malicious network content is a type of malware distributed over a network via websites, e.g., servers operating on a network according to a hypertext transfer protocol (HTTP) standard or other well-known standard. Malicious network content distributed in this manner may be actively downloaded and installed on a computer, without the approval or knowledge of its user, simply by the computer accessing the web site hosting the malicious network content (the “malicious web site”). Malicious network content may be embedded within objects associated with web pages hosted by the malicious web site. Malicious network content may also enter a computer on receipt or opening of email. For example, email may contain an attachment, such as a PDF document, with embedded malicious executable programs. Furthermore, malicious content may exist in files contained in a computer memory or storage device, having infected those files through any of a variety of attack vectors.
Various processes and devices have been employed to prevent the problems associated with malicious content. For example, computers often run antivirus scanning software that scans a particular computer for viruses and other forms of malware. The scanning typically involves automatic detection of a match between content stored on the computer (or attached media) and a fingerprint library or database of known malware. The scanning may be initiated manually or based on a schedule specified by a user or system administrator associated with the particular computer. Unfortunately, by the time malware is detected by the scanning software, some damage on the computer, loss of privacy or intellectual property may have already occurred, and the malware may have propagated from the infected computer to other computers. Additionally, it may take days or weeks for new fingerprints to be manually created, the scanning fingerprint library updated and received for use by the scanning software, and the new fingerprints employed in new scans.
Moreover, anti-virus scanning utilities may have limited effectiveness to protect against all exploits by polymorphic malware. Polymorphic malware has the capability to mutate to defeat the fingerprint match process while keeping its original malicious capabilities intact. Fingerprints generated to identify one form of a polymorphic virus may not match against a mutated form. Thus polymorphic malware is often referred to as a family of virus rather than a single virus, and improved anti-virus techniques to identify such malware families is desirable.
Another type of malware detection solution employs virtual environments to replay or otherwise process content within a sandbox established by one or more virtual machines (VMs). Such solutions monitor content during execution to detect anomalies that may signal the presence of malware. One such system offered by FireEye, Inc., the assignee of the present patent application, employs a two-phase malware detection approach to detect malware contained in network traffic monitored in real-time. In a first or “static” phase, a heuristic is applied to network traffic to identify and filter packets that appear suspicious in that they exhibit characteristics associated with malware. In a second or “dynamic” phase, the suspicious packets (and typically only the suspicious packets) are processed within one or more virtual machines. For example, if a user is trying to download a file over a network, the file is extracted from the network traffic and analyzed in the virtual machine. The results of the analysis aids in determining whether the file is classified as malicious.
New malware (“zero day”) and polymorphic malware may not have valid fingerprints available in the matching database for traditional detection using anti-virus scanning or traditional two-phase malware detection. Additionally, traditional malware detection may only trigger malware notification or alerts when certain known malicious fingerprints are present. When attempting to classify content as malicious it can be especially important to minimize false positives, and false negatives, and strike an appropriate balance between the false positives and false negatives. Therefore, traditional malware detection schemes may ignore potentially benign behaviors even though the behaviors also may indicate malware.