Graphs are often used to model relationships between entities such as links between websites on the Internet, and users of social networking applications. As may be appreciated, such networks are often very large and include a large number of nodes and edges. Due to the size of these graphs, computations using the graphs may require a large amount of computational resources.
One such computation is known as the distance distribution. The distance distribution of a node i contains for each distance d, the number of nodes in a graph that are a distance d from i. The distance distribution of the graph is the number of node pairs for each distance d. The distance distribution captures useful properties of the nodes of the graph including node centrality and effective diameter.
One method for determining the distance distribution is by computing an all-distances sketch for each node. An all-distances sketch for a node v includes a random sample of nodes from the graph, where the inclusion probability of a node u in the sample decreases with its distance from v. The generated all-distances sketch for each node in the graph can be used to estimate the distance distribution of the graph as well as other graph operations such as node closeness, and more general queries.