Search engine optimization (SEO) is a collection of techniques used to achieve higher search rankings for a given website. “Black hat SEO” is the method of using unethical SEO techniques in order to obtain a higher search ranking. These techniques may include things like keyword stuffing, cloaking, and link farming, which are used to “game” the search engine algorithms. Hackers may use these techniques to poison search results of popular search terms to redirect users to misleading applications (e.g., fake antivirus scanners) or other malware. Hackers may identify vulnerable network sites and add numerous additional fake web pages to these sites. These fake pages may be based on popular search terms such as, for example, keywords in the Google “Hot trends” or popular terms in other search engines.
For example, malicious Uniform Resource Locators (URLs) returned by searching for “super bowl 2010 line” may include links that hackers want to look legitimate. Hackers may also add related content to these pages. Each of these fake web pages may be added without the website owner's knowledge or consent. When a user clicks on these links in the search result page, they may be redirected to fake antivirus pages or other malware.
These search engine optimized pages may distinguish between a search engine accessing them, a user accessing them directly, and a user accessing the page by clicking on or otherwise navigating from a search engine result. Because of this distinction the web site (which may be a legitimate website that has been hacked) may provide different content to different requesters. A web crawler or other search engine component accessing the web page may be provided with content related to a popular keyword. A person navigating directly to the web page may receive a normal web page (e.g., a web page associated with the site before it was hacked). However, a person navigating to the site via a search engine result may be redirected to a site associated with malware (e.g., a URL which downloads malware to a client, a site offering misleading applications, or another malware site). Because a person navigating directly to the webpage may not discover the malware, the malware may remain hidden longer. Because the malware associated site may use keyword stuffing of popular keywords and link farming the malware may achieve a high ranking on one or more search engines. This may allow the malware to be effectively distributed. However, the pages upon which the search results are based may not be the pages that will be returned when a user clicks on the search result. Instead the user may be redirected to a malware site. This may provide a challenge to normal methods used to detect and prevent malware. Additionally, these sites may frequently change and may be updated to respond to new popular keywords or trends. A hacker may use other methods to make search results look legitimate. Some hacked or malware sites may trick a search engine into thinking they are a legitimate site (e.g., CNN). The malicious search result may then display as if it were from the legitimate website. Some hacked or malware sites may trick a search engine into thinking a malicious URL is associated with a particular type of content which may appear more legitimate or safer (e.g., a PDF file). These measures and others may make optimized malicious search engine results difficult to detect.
In view of the foregoing, it may be understood that there may be significant problems and shortcomings associated with current optimized malicious search engine results identification technologies.