The present invention relates generally to computer implemented detection of spam hosts.
Every day, millions of users search for information on the web via search engines. Through their interaction with search engines, not only are they able to locate the information they are looking for, but they also provide implicit feedback on the results shown in response to their queries by clicking or not clicking onto the search results.
Nowadays search engines can record query logs that keep various types of information about which documents (e.g., web pages or web sites) users click for which query. Such information can be seen as “soft” relevance feedback for the documents that are clicked as a result of specific queries. This “soft” relevance feedback may be used to generate a score associated with these documents that indicates the relevance of the documents to a particular query. This score may then be used by search engines to provide the most relevant documents in response to queries. Unfortunately, some web pages include terms that are intended to mislead search engines so that a greater number of users will view the web sites. Accordingly, the score associated with some of these documents may be undeserved. The web pages or web sites that have received these undeserved scores are often referred to as spam hosts.
In view of the above, it would be beneficial if improved methods of detecting spam hosts could be implemented.