There exist many situations where computer networks comprise a typically very large number of interconnected nodes. For example, the communication network of Skype represents a large social network for peer to peer communication. FIG. 1 is a schematic diagram of a small part of a typical computer network. The network is shown to comprise a plurality of nodes Ni. Each node can be associated with one or more physical computer devices as shown for example in the case of node Ni which is shown to be associated with a mobile device 2, a PC 4 and a tablet 6. Each node is associated with a single user, who in this case can register or log in with a particular network using any one of the computer devices. The nodes are shown interconnected by connections Ci. In the context of the physical network, the connections Ci can be implemented in any known way, wired or wireless. In the context of users associated with nodes, the connections do not necessarily pertain to a single physical connection in a network, but represent a relationship between users associated with the nodes at either end of the connection. As an example, in the case of Skype, two users are considered to be connected if they are in each other's contact lists. A common challenge with such networks is to allow a user to search for another user by name for example and to see the results of a search ranked in the order of their shortest path distance to him. Similarly, a user may wish to know what chain of contacts allows him to reach another user in the network. Attempts to solve the problem have used analytic techniques for finding the shortest paths between a given pair of nodes in a graph.
There exists a large body of methods to address this problem. Existing methods can be broadly classified into exact and approximate. Exact methods, such as those based on Dijkstra's traversal, are prohibitively slow for performing online queries on graphs with hundreds of millions of vertices, which is a typical size for a contemporary social network. Among the approximate methods, a family of scalable algorithms for this problem are the so-called landmark-based (or sketch-based) approaches. In this family of techniques, a fixed set of landmark nodes is selected and distances are precomputed from each vertex to some or all of the landmarks. Knowledge of the distances to the landmarks, together with the triangle inequality, typically allows one to compute approximate distance between any two vertices in O(k) time, where k is the number of landmarks, and O(kn) space, where n is the number of vertices in the network. Those estimates can then be used as-is, or exploited further as a component of a graph traversal or routing strategy in order to obtain an exact shortest path.
Reference is made to a paper by Potamias et al entitled “Fast Shortest Path Distance Estimation in Large networks” in CIKM '09: Proceedings of the 18th . . . Conference on IKM, pages 867-878 NY, USA 2009. In that paper, a landmark based distance estimation algorithm is evaluated under different landmark selection strategies. The algorithm relies on the storage of the distance of each landmark node to each other vertex in the graph. As with other landmark-based algorithms, approximation quality can be poor, particularly as networks scale up over time.
In another paper by Gubichev, et al entitled “Fast and accurate estimates of shortest paths in large graphs. In CKM'10: Proceeding of the 19th AEM Conference in IKM pages 499-508, AEM 2010.” Complete paths are stored from each vertex to each landmark, with different sets of landmarks for each vertex. This significantly increases memory requirements and increases execution times for processing queries.
Although landmark-based algorithms do not provide strong theoretical guarantees on approximation quality, they have been shown to perform well in practice, scaling up to graphs with millions or even billions of edges with acceptable accuracy and response times of under one second per query.
It is an objective of the present invention to improve the accuracy over existing techniques, with acceptable computation times and memory requirements for returning results of a search query.