A search engine is a software program designed to help a user access files stored on a computer, for example on the World Wide Web (WWW), by allowing the user to ask for documents meeting certain criteria (e.g., those containing a given word, a set of words, or a phrase) and retrieving files that match those criteria. Web search engines work by storing information about a large number of web pages (hereinafter also referred to as “pages” or “documents”), which they retrieve from the WWW. These documents are retrieved by a web crawler or spider, which is an automated web browser which follows every link it encounters in a crawled document. The contents of each document are indexed, thereby adding data concerning the words or terms in the document to an index database for use in responding to queries. Some search engines, also store all or part of the document itself, in addition to the index entries. When a user makes a search query having one or more terms, the search engine searches the index for documents that satisfy the query, and provides a listing of matching documents, typically including for each listed document the URL, the title of the document, and in some search engines a portion of document's text deemed relevant to the query. In many instances the list of matching documents is ordered by a ranking, or importance value of the document determined, in part, on how the documents link to each other.
More generally, a linked database is a database of documents containing mutual citations. Examples of linked databases include the world wide web or other hypermedia archive, the database of US patents, and a database of academic journal articles. A linked database can be represented as a directed graph of N nodes, where each node corresponds to a document in the database and where the directed connections between nodes correspond to the links, citations, or references from one document to another.
It can be useful for various purposes to rank or assign importance values to nodes in a large linked database. For example, the relevance of database search results can be improved by sorting the retrieved nodes according to their ranks, and presenting the most important, highly ranked nodes first. One approach to ranking documents involves examining the intrinsic content of each document or the backlink anchor text in parents of each document. This approach can be computationally intensive and often fails to assign highest ranks to the most important documents. Another approach to ranking involves examining the extrinsic relationships between documents, i.e., from the link structure of the directed graph. This type of approach is called a link-based ranking. For example, U.S. Pat. No. 6,285,999 to Page discloses a technique used by the Google search engine for assigning a rank to each document in a hypertext database. According to the link-based ranking method of Page, the rank of a node is recursively defined as a function of the ranks of its parent nodes. Looked at another way, the rank of a node is the steady-state probability that an arbitrarily long random walk through the network will end up at the given node. Thus, a node will tend to have a high rank if it has many parents, or if its parents have high rank.
A problem with known link-based ranking methods is that the link structure surrounding a node can be deliberately modified to artificially inflate the rank of the node. Consequently, the ranking results of current link-based ranking methods are susceptible to indirect manipulation and distortion. It would be desirable to identify and eliminate or reduce the effects of certain techniques to artificially inflate the ranks of nodes.