1. Field of the Invention
Systems and methods consistent with the principles of the invention relate generally to information retrieval and, more particularly, to ranking documents based on the context of references associated with the documents.
2. Description of Related Art
The World Wide Web (“web”) contains a vast amount of information. Search engines assist users in locating desired portions of this information by cataloging web documents. Typically, in response to a user's request, a search engine returns links to documents relevant to the request.
Search engines may base their determination of the user's interest on search terms (called a search query) provided by the user. The goal of a search engine is to identify links to relevant search results based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web documents. Web documents that contain the user's search terms are considered “hits” and are returned to the user.
The quality of the documents returned to the user depend on the quality of the ranking process used to rank the search results. For example, some ranking processes rank documents based on the number of links pointing to them, the ranks of the documents pointing to them, or the anchor text associated with the links pointing to them. Several techniques have arisen to artificially inflate the rank of a document, thereby degrading the quality of the search results.
One such technique relates to link-based spamming. Link-based spamming involves obtaining a large number of links to a particular document to increase the rank of the document. Link farms, for example, provide a network of web documents that are heavily cross-linked to each other so as to increase the ranks of the documents. Some spammers pay owners of highly ranked documents to include a link to their document so as to increase the rank of their document.
Another technique relates to anchor text spamming. Anchor text spamming involves obtaining a large number of web documents to link to a particular document using the same anchor text with which the document is to be associated. The desired result is that if a user provides a search query with terms that match the anchor text, then the document will be provided highly ranked in the search results.
Yet another technique relates to bombing (e.g., Google bombing). Bombing involves setting up a large number of documents with links that point to a specific document so that the document will obtain a high rank when users enter particular text associated with the link. One popular bomb involved a large number of documents including the anchor text “miserable failure” associated with a link to President Bush's biography. Therefore, whenever a user entered the search query “miserable failure,” the highest ranked result was a link to President Bush's biography.
A further technique relates to the use of standard frames that are associated with a number of web documents. Standard frames sometimes include “products” links, “jobs” links, “investor” links, etc. that are typically associated with business web sites. Oftentimes, the business will include these same links on every document associated with its web site. This duplication of links may artificially inflate the ranks of the documents associated with these links, especially when the web site includes a large number of documents.
All of these techniques degrade the quality of the search results returned by a search engine.