Online social network services provide members with a mechanism for defining, and memorializing in a digital format, representations of themselves (e.g., member profiles) and their relationships with other people. This digital representation of relationships between members is frequently referred to as a social graph. Many social graphs store important entities such as member identification, school, company, location, and so forth in the form of social graph data structure comprising vertices and edges. Many such implementations adopt key-value stores in which a vertex is a key and its adjacency list of edges is a value. For example, the member identification may be the key and its first degree connections may be corresponding values.
Popular social graph services, such as finding paths, computing network distance, and counting network size, utilize breadth-first traversal through the social graph data structure. For example, in order to compute network distance between two vertices, the system may traverse the first-degree connections and check whether the first-degree connections contain the other vertex. If so, then the network distance is determined to be first degree. If not, then the second-degree connections are traversed to see if the second-degree connections contain the other vertex. This process may continue until it is determined which degree connections contain the other vertex.
Typically under breadth-first traversal when computing network distance, the key-value store containing the vertex and its adjacency list needs to be looked up over and over until the distance is determined. The number of look-ups in this key-value store can grow rapidly as higher and higher degrees of separation are searched due to the exponential growth in number of vertices. The result is that popular graph services, having lots of members, require significantly high numbers of key-value store look-ups.
For scalability, social network services typically adopt distributed data stores, which distribute the entire social graph over multiple servers. This can cause high latency during breath-first traversal, however, since it requires multiple remote calls to other machines to fetch connections of vertices. For example, if the system needs to fetch second-degree connections of a vertex, it may need to make remote calls to most of the data stores to fetch the connections of the first-degree connections of the vertex, because it is unclear which server hosts the data related to the vertex in question. This means that one single request for a network distance, for example, can cause multiple steps of remote calls to various distributed data stores, which causes slower response time and utilizes network bandwidth.