Web search services allow users to submit queries, and in response return a set of links to web pages that satisfy the query. Because a query may potentially produce a large number of results, results are typically displayed in a ranked order. There are many ways to rank-order the links resulting from a query, including content-based ranking, usage based ranking, and link-based ranking. Content-based ranking techniques determine how relevant the content of a document is to a particular query. Usage-based ranking techniques monitor which result links users actually follow, and boost the ranks of these result links for subsequent queries. Link-based ranking techniques examine how many other web pages link to a particular web page, and assign higher ranks to pages with many incoming links.
One problem associated with these techniques is scalability. For example, a well known search engine has been observed to contain approximately three (3) billion web pages over which it can search. Also observed from analyzing one (1) billion web pages is that each web page had an average of 42 distinct outgoing links. Thus, a web graph modeling significant portions of the web will have billions of nodes and on the order of 100 billion edges.
Previous attempts to address this problem include fitting fairly large web graphs into the main memory of a very-large memory processor by compressing nodes and edges, and storing the web graph on disk. However, these attempts have their own limitations. For example, fitting a graph representing one (1) billion web pages and 40 billion links between them may require a machine with approximately 50 GB of main memory. This amount of memory exceeds the capacity of cost-efficient commodity PCs, which typically have up to 4 GB per machine. Furthermore, this technique does not scale to arbitrarily large web graphs, since there is a dearth of very-large-memory computers. Storing a large web graph on a disk increases access time. It has also been observed that computing the ranks of 180 million web pages can take approximately 25 minutes, and it is estimated that computing the rank for 10 times that many pages would take more than 10 times longer (worse than linear behavior). This technique scales poorly as the web graphs increase in size. It is also impracticable to conduct link-based ranking at query time due to the long access times.
A technique for maintaining a large number of links, which overcomes the above described time and scalability problems is desired.