It has become common for users of computers connected to the World Wide Web (the “web”) to employ web browsers and search engines to locate web pages (or “documents”) having specific content of interest to them (the users). A web-based commercial search engine may index tens of billions of web documents maintained by computers all over the world. Users of the computers compose queries, and the search engine identifies documents that match the queries to the extent that such documents include keywords from the queries (known as the search results or result set) or that the documents seem otherwise relevant to the queries.
Unfortunately, in addition to valid documents on the Internet, there also exists a substantial amount of documents that contain or point to malicious software (“malware”), masquerading as useful and relevant documents. This malware is intentionally designed to cause harm to other computers or computer users (ranging in degree from minor to substantial) either directly (e.g., by harming the computer system itself) or indirectly (e.g., by spying and/or stealing information to facilitate identity theft of the computer's user). For example, recent research suggests that nearly 11,000 domains are being used to serve malware in the form of fake anti-virus software.
The proliferation of malware throughout the Internet over the past several years has been extensive. In addition to commonplace techniques for spreading malware—such as through spam email—attackers are constantly devising new and more sophisticated methods to infect user computers. One such technique is the widespread use of search engines as the medium for distributing malware. By manipulating search engine ranking algorithms using a variety of search engine optimization (SEO) techniques, attackers are able to poison search results for popular terms with seemingly relevant and harmless links to malicious web pages. By one estimate, in 2010, over half of the most popular keyword searches lead to first page results containing at least one link (if not many links) to a malicious web page.