Malicious software, or malware for short, may include any program or file that is harmful by design to a computer. Malware includes computer viruses, worms, Trojan horses, adware, spyware, and any programming that gathers information about a computer or its user or otherwise operates without permission. The owners of the computers are often unaware that these programs have been added to their computers and are often similarly unaware of their function.
Malicious network content is a type of malware distributed over a network via websites, e.g., servers operating on a network according to a hypertext transfer protocol (HTTP) standard or other well-known standard. Malicious network content distributed in this manner may be actively downloaded and installed on a computer, without the approval or knowledge of its user, simply by the computer accessing the web site hosting the malicious network content (the “malicious web site”). Malicious network content may be embedded within objects associated with web pages hosted by the malicious web site. Malicious network content may also enter a computer on receipt or opening of email. For example, email may contain an attachment, such as a PDF document, with embedded malicious executable programs. Furthermore, malicious content may exist in files contained in a computer memory or storage device, having infected those files through any of a variety of attack vectors.
Various processes and devices have been employed to prevent the problems associated with malicious content. For example, computers often run antivirus scanning software that scans a particular computer for viruses and other forms of malware. The scanning typically involves automatic detection of a match between content stored on the computer (or attached media) and a library or database of signatures of known malware. The scanning may be initiated manually or based on a schedule specified by a user or system administrator associated with the particular computer. Unfortunately, by the time malware is detected by the scanning software, some damage on the computer or loss of privacy may have already occurred, and the malware may have propagated from the infected computer to other computers. Additionally, it may take days or weeks for new signatures to be manually created, the scanning signature library updated and received for use by the scanning software, and the new signatures employed in new scans.
Moreover, anti-virus scanning utilities may have limited effectiveness to protect against all exploits by polymorphic malware. Polymorphic malware has the capability to mutate to defeat the signature match process while keeping its original malicious capabilities intact. Signatures generated to identify one form of a polymorphic virus may not match against a mutated form. Thus polymorphic malware is often referred to as a family of virus rather than a single virus, and improved anti-virus techniques to identify such malware families is desirable.
Another type of malware detection solution employs virtual environments to replay content within a sandbox established by virtual machines (VMs). Such solutions monitor the behavior of content during execution to detect anomalies that may signal the presence of malware. One such system offered by FireEye, Inc., the assignee of the present patent application, employs a two-phase malware detection approach to detect malware contained in network traffic monitored in real-time. In a first or “static” phase, a heuristic is applied to network traffic to identify and filter packets that appear suspicious in that they exhibit characteristics associated with malware. In a second or “dynamic” phase, the suspicious packets (and typically only the suspicious packets) are replayed within one or more virtual machines. For example, if a user is trying to download a file over a network, the file is extracted from the network traffic and analyzed in the virtual machine. The results of the analysis aids in determining whether the file is malicious. The two-phase malware detection solution may detect numerous types of malware and, even malware missed by other commercially available approaches. Through verification, the two-phase malware detection solution may also achieve a significant reduction of false positives relative to such other commercially available approaches. Dealing with false positives in malware detection may needlessly slow or interfere with download of network content or receipt of email, for example. This two-phase approach has even proven successful against many types of polymorphic malware and other forms of advanced persistent threats.
Network traffic can be in a variety of forms. One type of network traffic is in a form of emails, where an email may contain an attachment and/or a link such as a universal resource locator (URL) link. An email may or may not be malicious dependent upon whether a link embedded therein is associated with a remotely located malicious resource. There has been a lack of efficient ways for detecting whether a link is malicious.