A search engine is a software program designed to help a user access files stored on a computer, for example on the World Wide Web (WWW), by allowing the user to ask for documents meeting certain criteria (e.g., those containing a given word, a set of words, or a phrase) and retrieving files that match those criteria. Web search engines work by storing information about a large number of web pages (hereinafter also referred to as “pages” or “documents”), which they retrieve from the WWW. These documents are retrieved by a web crawler or spider, which is an automated web browser which follows the links it encounters in a crawled document. The contents of each successfully crawled document are indexed, thereby adding data concerning the words or terms in the document to an index database for use in responding to queries. Some search engines, also store all or part of the document itself, in addition to the index entries. When a user makes a search query having one or more terms, the search engine searches the index for documents that satisfy the query, and provides a listing of matching documents, typically including for each listed document the URL, the title of the document, and in some search engines a portion of document's text deemed relevant to the query.
It can be useful for various purposes to rank or assign importance values to nodes in a large linked database. For example, the relevance of database search results can be improved by sorting the retrieved nodes according to their ranks, and presenting the most important, highly ranked nodes first. Alternately, the search results can be sorted based on a query score for each document in the search results, where the query score is a function of the document ranks as well as other factors.
One approach to ranking documents involves examining the intrinsic content of each document or the back-link anchor text in parents of each document. This approach can be computationally intensive and often fails to assign highest ranks to the most important documents. Another approach to ranking involves examining the extrinsic relationships between documents, i.e., from the link structure of the directed graph. This type of approach is called a link-based ranking. For example, U.S. Pat. No. 6,285,999 to Page discloses a technique used by the Google search engine for assigning a rank to each document in a hypertext database. According to the link-based ranking method of Page, the rank of a node is recursively defined as a function of the ranks of its parent nodes. Looked at another way, the rank of a node is the steady-state probability that an arbitrarily long random walk through the network will end up at the given node. Thus, a node will tend to have a high rank if it has many parents, or if its parents have high rank.
Although link-based ranking techniques are improvements over prior techniques, in the case of an extremely large database, such as the world wide web which contains billions of pages, the computation of the ranks for all the pages can take considerable time. Accordingly, it would be valuable to provide techniques for calculating page ranks with greater computational efficiency.