The present specification is directed to systems, components of systems, and methods performed by them, that can find multiple shortest paths in very large graphs.
Search engines allow users to search billions of web pages for pages of interest to them. Because a given query may produce millions of search results, it is important to be able to rank these search results to present high-quality results to the user. Graph analysis methods, in which the link structure of the web is represented as a graph, have been used to this end. One family of methods is described in commonly-owned U.S. application Ser. No. 11/546,755 (the “'755 application”) entitled “Method and apparatus for producing a ranking for pages using distances in a web-link graph,” the disclosure of which is incorporated here by reference. In this and other applications, a system required that can compute a shortest path on a weighted directed graph.
A number of techniques for solving shortest paths problems have been implemented. The Dijkstra and the Bellman-Ford algorithms for the single source shortest paths problem have no parallelism and/or are not scalable. (While Bellman-Ford can easily be parallelized, it is not scalable because it requires too many iterations, each propagating messages through all the edges). Others' work on parallelizing the Dijkstra algorithm has resulted in system designs that rely on the use of shared memory, random access to in-memory graph data, and reliable machines. Such designs cannot run across many machines, cannot be realized in the absence of shared memory access, cannot work with large graphs stored on disk, and cannot routinely handle machine failures. Examples of such systems are described in K. Madduri, D. A. Bader, J. W. Berry, and J. R. Crobak, “Parallel Shortest Paths Algorithms for Solving Large-Scale Instances,” 9th DIMACS Implementation Challenge—The Shortest Path Problem, DIMACS Center, Rutgers University, Piscataway, N.J., Nov. 13-14, 2006, and J. R. Crobak, J. W. Berry, K. Madduri, and D. A. Bader, “Advanced Shortest Path Algorithms on a Massively-Multithreaded Architecture,” First Workshop on Multithreaded Architectures and Applications (MTAAP), Long Beach, Calif., Mar. 30, 2007. These systems use shared memory models and where the graph data is all in memory, and are described as working on large graph instances having 2 billion edges.
The systems described in this specification can compute single source and multiple source shortest paths for graph instances having trillions of edges and have the capacity to scale to even larger size graphs.