The Internet has become widely utilized as an advertising means for businesses. Search engines, in addition to providing results for user queries, also serve advertisements alongside the search results. The advertisements served may be related to the search query. The more relevant the advertisements are to user's intent and the query, the greater the value to users, businesses, and search engines. However, the high amounts of revenue associated with Internet sales and advertising are also an incentive for vendors to manipulate search engines to include vendor web page links within the search results or increase ranking of a vendor web page link within the search results
Search result can be manipulated by providing false information to web crawlers/bots. Search engines typically utilize web crawlers or bots to search the Internet for web site content, copying web pages or information. The search engine can utilize this information to generate an index that facilitates searches. There are many legitimate reasons for providing different information or a different version of a web page to a crawler and a browser. For instance, web servers may remove images or audio content from web page information provided to a crawler to minimize bandwidth. However, some unscrupulous servers seek to manipulate search engines by providing one set of information to the crawler and presenting a substantially different web page to users. This type of manipulation is often referred to as “cloaking,” a particular type of web spam in which users are redirected to undesired web sites. Web spam is somewhat similar to email spam, where unsolicited information and/or advertisements are sent to users. Spam in general is the electronic equivalent of traditional junk mail.
Due to the nature and volume of spam, spam is considered a nuisance that inconveniences users and creates user frustration. Not only do users waste time sorting through a deluge of undesired information, but they also likely bear the costs of the tremendous amounts of resources (e.g., storage space, network bandwidth, faster processors, . . . ) required to cope with various forms of spam (e.g., irrelevant search results, email advertisements, etc.). A variety of systems and techniques have been developed and employed to combat spam in both the Web and email, often requiring numerous filtering processes. Once identified, action is taken on the content such as redirection to a designated location (e.g., spam folder, quarantine region . . . and/or deletion, etc. However, the traditional filtering methods frequently fall far short of adequately eliminating undesired spam.