There exist many situations in which networks typically comprise a very large number of interconnected nodes.
For example, a computer network is a large network of interconnected routers, which act as interconnected nodes. FIG. 1 is a schematic diagram of a small part of a typical computer network. The network is shown to comprise a plurality of nodes (Ni) in the form of a plurality of routers arranged to route between a plurality of physical computer devices (as shown for example in the case of router Ni, which is shown to be associated with a mobile device 2, a PC 4 and a tablet 6). The routers are shown interconnected by connections Ci. In the context of the physical network, the connections Ci can be implemented in any known way, wired or wireless.
In another example, a communication network such as a social network comprises a large number of interconnected users. In this case, each node is a user, who can register or log into a particular network using a computer device. In the context of the nodes being users, the connections between users do not necessarily pertain to a single physical connection in a network, but represent a relationship between users associated with the nodes at either end of the connection. As an example, two users are considered to be connected if they are in each other's contact lists.
A task to be performed in such networks is to perform a search to find a separation between two nodes, i.e. a path or distance between the nodes. For example, in the case of a computer network, it may be desired to find the most efficient route across the network. As another example, when a user searches for the name of an acquaintance met via a friend, the acquaintance may be found as one of the search results having the shortest path length from the user. Similarly, a user may wish to know what chain of contacts allows him to reach another user in the network. Previous methods for finding the shortest paths between a given pair of nodes in a graph have used analytic techniques.
Existing analytic methods can be broadly classified into exact and approximate. Exact methods, such as those based on Dijkstra's traversal, are prohibitively slow for performing online queries on graphs with hundreds of millions of vertices (or “nodes”), which is a typical size for a contemporary social network. Among the approximate methods, a family of scalable algorithms for this problem are the so-called landmark-based (or sketch-based) approaches. In this family of techniques, a fixed set of landmark nodes (also referred to as sketches, pivots or beacons in various works) is selected and distances are precomputed from each vertex to some or all of the landmarks. Knowledge of the distances to the landmarks, together with the triangle inequality theorem for finding distances, typically allows approximate distance between any two vertices to be computed in O(k) time (where k is the number of landmarks) and O(kn) space (where n is the number of vertices in the network). Those estimates can then be used as-is, or exploited further as a component of a graph traversal or routing strategy in order to obtain an exact shortest path.
Landmark-based algorithms have been shown to perform well in practice, scaling up to graphs with millions or even billions of edges with acceptable accuracy and response times of under one second per query.
Various modifications of the basic landmark-based technique exist, for example, which allow the computation of the shortest paths themselves (rather than just distances), which support dynamic changes to the graph of interconnected nodes and/or which use landmark approximations as a guide to speed up the search for the exact shortest path. These modifications have been shown to provide good accuracy while keeping the query time in the order of milliseconds, even for very large graphs. The accuracy of landmark-based methods can be increased by using more landmarks. This, however, leads to linear increase in memory and disk space usage with only a slight reduction of the approximation error.