1. Field of the Invention
Implementations described herein relate generally to information retrieval and, more particularly, to the ranking of documents.
2. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information that is ever-changing. Existing search engines attempt to rank this information in a meaningful way so that they can provide high quality search results. It is beneficial for information providers (e.g., web marketers and web site designers) to have their information (or their customers' information) ranked higher by the search engines.
Rank-modifying spamming techniques, such as index and link spamming, include a set of techniques by which information providers attempt to fool a search engine into ranking their information (or their customers' information) at or near the top of the list of search results. Some of the techniques used by rank-modifying spammers include keyword stuffing, invisible text, tiny text, page redirects, META tags stuffing, and link-based manipulation.
Keyword stuffing involves the repeated use of a word (and more likely a set of words) within a page to increase its frequency on the page and, thereby, make the page appear very relevant to a search relating to the word. Invisible text includes keywords inserted in a page, where the text of the keywords is the same color as the background of the page. Tiny text involves the use of keywords in very small text within a page. Invisible text and tiny text attempt to make a page appear relevant for a wide range of search queries even though the content of the page is not very relevant, or irrelevant, to the search queries.
Page redirects involves the use of a first page with code to automatically redirect the user to a second page that typically has nothing to do with the search query the user provided. The first page typically uses another spamming technique to make the first page appear relevant for a wide range of search queries. META tags stuffing involves the use of a large set of keywords in the META tags on a page, where the keywords typically do not relate to the content of the page. META tags stuffing attempts to make the page appear relevant for a wide range of search queries even though the content of the page is not very relevant, or irrelevant, to the search queries.
Link-based manipulation may include the creation or manipulation of a first document or a set of first documents to include a link or a number of links to a second document in an attempt to increase the rank of the second document. Some existing search engines determine the rank of a document based on the number or quality of the links that point to the document. A link farm is an example of a link-based manipulation technique.
Such manipulation of search results degrades the quality of the search results provided by existing search engines.