There exist many situations where computer networks comprise a typically very large number of interconnected nodes. For example, the communication network of Skype represents a large social network for peer to peer communication. FIG. 1 is a schematic diagram of a small part of a typical computer network. The network is shown to comprise a plurality of nodes Ni. Each node can be associated with one or more physical computer devices as shown for example in the case of node Ni which is shown to be associated with a mobile device 2, a PC 4 and a tablet 6. Each node is associated with a single user, who in this case can register or log in with a particular network using any one of the computer devices. The nodes are shown interconnected by connections Ci. In the context of the physical network, the connections Ci can be implemented in any known way, wired or wireless. In the context of users associated with nodes, the connections do not necessarily pertain to a single physical connection in a network, but represent a relationship between users associated with the nodes at either end of the connection. As an example, in the case of Skype, two users are considered to be connected if they are in each other's contact lists. A common challenge with such networks is to allow a user to search for another user by name for example and to see the results of a search ranked in the order of their shortest path distance to him. Similarly, a user may wish to know what chain of contacts allows him to reach another user in the network. Attempts to solve the problem have used analytic techniques for finding the shortest paths between a given pair of nodes in a graph.
There exists a large body of methods to address this problem. Existing methods can be broadly classified into exact and approximate. Exact methods, such as those based on Dijkstra's traversal, are prohibitively slow for performing online queries on graphs with hundreds of millions of vertices, which is a typical size for a contemporary social network. Among the approximate methods, a family of scalable algorithms for this problem are the so-called landmark-based (or sketch-based) approaches. In this family of techniques, a fixed set of landmark nodes is selected and distances are precomputed from each vertex to some or all of the landmarks. Knowledge of the distances to the landmarks, together with the triangle inequality, typically allows one to compute approximate distance between any two vertices in O(k) time, where k is the number of landmarks, and O(kn) space, where n is the number of vertices in the network. Those estimates can then be used as-is, or exploited further as a component of a graph traversal or routing strategy in order to obtain an exact shortest path.
An important aspect of landmark based methods is the way landmarks are selected—a careful selection strategy can have a significant positive effect. Strategies which rely on selecting landmarks with high degree, betweeness—and closeness—centrality as well as ensuring proper dispersion of landmarks over the graph and its paths have been suggested.
Reference is made to a paper by Potamias et al entitled “Fast Shortest Path Distance Estimation in Large networks” in CIKM '09: Proceedings of the 18th International Conference on Information and Knowledge Management, pages 867-878 NY, USA 2009. In that paper, a landmark based distance estimation algorithm is evaluated under different landmark selection strategies. According to this paper, the highest degree and closeness centrality techniques have been shown to typically yield highest accuracy.
Although landmark-based algorithms do not provide strong theoretical guarantees on approximation quality, they have been shown to perform well in practice, scaling up to graphs with millions or even billions of edges with acceptable accuracy and response times of under one second per query.
It is an objective of the present invention to improve the accuracy over existing techniques, with acceptable computation times for generating a data structure for use in processing a search query.