The Internet presently comprises billions of web pages interconnected via hyperlinks. Users of the Internet typically use web browsing applications (“browsers”) to navigate among these pages by either selecting and clicking hyperlinks, or by manually entering a “Uniform Resource Locator” (“URL”) which allows the browser to access a particular web page directly. Often times, however, a user wishes to search the Internet for pages containing particular items of information. Because of the size of the Internet, it is impractical for a user to manually browse the Internet searching for relevant pages. Instead, users typically invoke search engines, which are computer applications developed for the purpose of searching the Internet. Search engines typically reside on server computing devices and accept queries from client users. A search engine is usually associated with an index of web pages, and, in response to a user query, returns a list of pages satisfying the query.
Some modern search engines rank web pages in order to provide users with more relevant results. Rank is an indication of a web page's quality responsive to the query. Many search engines represent the interconnection of web pages via a matrix, and finding a page ranking equates to finding the principal eigenvector of the matrix. In a conventional search engine, an iteration takes a ranking of the web pages and propagates it across the interconnection matrix, to obtain an updated ranking for the pages. Eventually, the rankings for all pages converge to fixed values, which are the entries of the principal eigenvector. This is equivalent to calculating the stationary distribution of a Markov chain. Due to the size of the matrices, computing the eigenvector—and thus the page ranks—is a computationally intensive task in existing systems, requiring several iterations of matrix manipulation before values for all pages converge to the eigenvector.
Conventionally, a search engine's static ranking computations involve the computation of the stationary distribution of a rather large random walk on the web graph, coupled with an occasional random reset. This computation is iterative, and the solution at any step is only an approximation to the true solution. Typically, the process is stopped when the solution changes little from step to step, reflecting near convergence.
This process is ad hoc, in that the decision as to how much change is acceptable is rather arbitrary, and the implications of inaccuracy are not well understood. Large error could be concentrated in small pockets, being highly detrimental to the ranks of those pages, or spread evenly over pages causing little harm. This uncertainty requires pessimism in measurement, potentially requiring far more executions than are actually needed for quality results.