The World Wide Web is a network of computers and information resources, typically with some information resources referring to other information sources via hyperlinks. For example, text or images can be encoded such that they refer to a network address (e.g., an URL, or Uniform Resource Locator) of other information resources.
The World Wide Web has grown explosively over the last few years to become a very large scale, distributed, evolving repository of information resources. Unfortunately, with this growth has come increased difficulty in identifying relevant information resources. To address this need, search engines have become a core capability of the Internet. For example, in performing a search, an Internet user may enter a word, a phrase, or a set of keywords into a web browser software, or a thin client toolbar running on the user's computer. The search engine, specifically its query processor, may find matching information resources, such as web pages, images, documents, videos, and so on, and provide a response to the user. Search engines have also become prevalent in Intranets, i.e., private enterprise networks, where keywords are used by such search engines to locate documents and files.
Unfortunately, as the available information has grown, the results returned by search engines have also grown. For example, a search on the words “white” and “house” on a popular search engine may return about three hundred million pages. In practical terms, since no user wants in a reasonable amount of time to investigate hundreds of web pages, much less hundreds of millions of pages, such search results are not very effective.
To attempt to resolve this dilemma, search engines attempt to rank search results by “relevance”, e.g., ranking those resources most likely to help the searching user as more relevant. In turn, “more relevant” resources are often returned first to the user as an initial set of search results.
To determine relevance, a number of approaches have been used. One popular approach, the PageRank algorithm used by Google, Inc., of Mountain View Calif., awarded to Larry Page in U.S. Pat. No. 6,285,999, is to attempt to determine the relevance of an information resource by considering how many referring documents have hyperlinks to the information source. The core idea of this approach, in effect, is that more “valuable” information resources will have a greater number of other information resources that are hyperlinked to them. A number of enhancements have been made to this approach. For example, it can be made recursive, so that the higher the value of the information resources that refer to an information resource, the higher the value of that information resources. That is, votes “in favor” of a given page (that is, hyperlinks to that page) by pages that are themselves “important” are counted more and therefore make that page more important.
Another enhancement is to minimize the weight of internal cycles of links from a web site. The reason is that a web site designer can deceptively affect the relevance of a web site, e.g., one could create a web site with a million pages, all of which refer to a particular home page. This approach may artificially create the impression that the particular home page is very important.
One problem with the various approaches above is that the weighting does not in fact rank web pages higher based on user relevance, but rather on how many web page designers happen to know about the ranked information resource and include such a link at the time that their web page is designed. In effect, it is like trying to determine how busy a traffic intersection is at rush hour by asking the traffic engineer who designed the road or by counting how many “important” highways are connected to the road rather than actually looking at the traffic at the actual intersection.