Search engines provide a powerful tool for locating documents in a large database of documents, such as the documents on the World Wide Web (WWW) or the documents stored on the computers of an Intranet. The documents are located in response to a search query submitted by a user. A typical search query includes only two to three terms. As the number of documents accessible via the Internet grows, the number of documents that match the search query may also increase. However, not every document matching the search query is equally important from the user's perspective. As a result, a user might be overwhelmed by the enormous number of documents retrieved by a search engine, if the search engine did not order the search results based on their relevance to the user's query.
One approach to improving the relevance of search results to a search query is to use the link structure of the documents in the database, such as the links between documents on the WWW, to compute global “importance” scores for the documents in the database. These scores are used to affect the order of search results when they are presented to the user. This approach is sometimes referred to as the PageRank algorithm. A more detailed description of the PageRank algorithm can be found in the article “The Anatomy of a Large-Scale Hypertextual Search Engine” by S. Brin and L. Page, 7th International World Wide Web Conference, Brisbane, Australia and U.S. Pat. No. 6,285,999, both of which are hereby incorporated by reference as background information.
An important assumption of the PageRank algorithm is that there is a “random surfer” who starts his web surfing at a randomly selected web page and keeps clicking on the links embedded in the web pages, never clicking on the “back” button. Occasionally, the random surfer re-starts his surfing by randomly picking another web page. The probability that the random surfer visits (i.e., views or downloads) a web page is a function of its PageRank. A web page may have a high PageRank if there are many other web pages pointing to it, or if some of the web pages pointing to it have a high PageRank. For example, www.espn.com is a famous website reporting sports-related news. It is conceivable that there are many web pages over the Internet having links to www.espn.com. In contrast, www.gostanford.com is a website that only reports news about the sports teams of Stanford University. For the purposes of this explanation, we will assume that www.espn.com is more frequently visited by WWW users than www.gostanford.com, and we will further assume that www.espn.com has a higher PageRank than www.gostanford.com.
For each link in the link structure (representing links between the documents in the database), there is a pair of source and destination web pages. Source pages are also sometimes called “referring” pages. Further, many links in source web pages are associated with text that describes the destination web page of the link. Such text, commonly referred to as anchor text, often provides a more concise and accurate description than the destination web page itself and therefore can be used in determining the relevance of the destination web page to a particular query. FIG. 1 provides two examples of the link structure between different web pages. Each of the source web pages 110-1 and 120-1 has an embedded link pointing to one of the two destination web pages 110-2 and 120-2, respectively. An anchor text “Sports News” is associated with each link, characterizing the key feature of the corresponding destination page. When a user submits a query for “sports news” to a search engine (such as the Google search engine) that considers a web page's PageRank and anchor text, the engine may return both web pages 110-2 and 120-2. If so, the www.espn.com web page 120-2 would likely be displayed higher in the search results than the www.gostanford.com web page 110-2 because page 120-2 has a higher PageRank than page 110-2. It is noted that the Google search engine, as of late 2003, determines the position of a document in a set of search results as a function of the PageRanks of the documents in the search results, the query terms, the documents in the search results, and the anchor text of links to those documents. For purposes of this discussion, we have assumed that large differences in the PageRanks of two documents often determine their relative position in a set of search results.
When using a conventional search engine, the ordering of documents in a set of search results may be less than optimal for a user with specific personal preferences. In particular, documents of highest interest to the user may be positioned lower in the search results than one or more other documents. It would be desirable to have a system and method of making the order of documents in a set of search results more attuned to a user's personal preferences, and it would be desirable for such a system to be computationally feasible.