The Internet presently comprises billions of web pages interconnected via hyperlinks. Users of the Internet typically use web browsing applications (“browsers”) to navigate among these pages by either selecting and clicking hyperlinks, or by manually entering a “Uniform Resource Locator” (“URL”) which allows the browser to access a particular web page directly. Often times, however, a user wishes to search the Internet for pages containing particular items of information. Because of the size of the Internet, it is impractical for a user to manually browse the Internet searching for relevant pages. Instead, users typically invoke search engines, which are computer applications developed for the purpose of searching the Internet. Search engines typically reside on server computing devices and accept queries from client users. A search engine is usually associated with an index of web pages, and, in response to a user query, returns a list of pages satisfying the query.
Some modern search engines rank web pages in order to provide users with more relevant results. Many search engines represent the interconnection of web pages via a matrix, and finding a page ranking equates to finding the principal eigenvector of the matrix. Such a search engine is described by Page et al. in “The PageRank citation ranking: Bringing order to the web,” Stanford Digital Libraries Working Paper, January 1998, which is hereby incorporated by reference in its entirety for all that it teaches without exclusion of any part thereof. Generally, an iteration takes a ranking of the web pages and propagates it across the interconnection matrix, to obtain an updated ranking for the pages. Eventually, the rankings for all pages converge to fixed values, which are the entries of the principal eigenvector. This is equivalent to calculating the stationary distribution of a Markov chain. Due to the size of the matrices, computing the eigenvector—and thus the page ranks—is a computationally intensive task in existing systems, requiring several iterations of matrix manipulation before values for all pages converge to the eigenvector. In order to compute the page rank more efficiently, researchers have attempted to exploit particular mathematical properties of the interconnection matrix in order to find methods of computing or approximating page rankings more quickly.
Computing page rankings can be a computationally intensive task for several reasons. One reason is simply the magnitude of the amount of information: with billions of web pages, performing the necessary computations can require significant amounts of time, even on fast processors, and even if all the data was immediately available to the processor. Another reason, however, is that generally not all the data is immediately available to the processor, necessitating retrievals of data from a storage area such as RAM, or a secondary storage such as a hard drive. Accessing RAM typically costs about 100 nanoseconds per access; accessing a hard drive typically costs about 5-10 milliseconds per access. This acts as a bottleneck for an otherwise fast processor—a processor capable of performing a billion operations a second is limited to reading the data in at a rate of about 10 million entries per second, or roughly one percent of its capability.
Furthermore, many matrix operations performed by existing page ranking techniques use arbitrary index labels for the web pages. As a result, there is no relationship between the rows (or the columns) of the corresponding interconnectivity matrix. For example, if Page #1 contains links to Pages #226, #4,250,221 and #1,000,000,152, then the corresponding row #1 of an interconnectivity matrix would have entries in columns #226, #4,250,221 and #1,000,000,152. In order to retrieve those pages, three accesses must be made from three likely very remote areas of memory.
Researchers have also investigated caching systems. Generally, a “cache” is a small area of fast memory that temporarily holds data or instructions retrieved from slower, less expensive memory. By temporarily storing often-used data or instructions in a cache, a processor does not need to retrieve those data or instructions from the slower memory. Effective use of caches thus increases the speed with which many operations are performed on a computing device.