Technical Field
This invention pertains in general to distributed computing and in particular to graph processing using a distributed computer system.
Background Information
A distributed computing system includes multiple autonomous computers that communicate through a network. The computers interact with each other via the network to solve a common problem. For example, a complex problem can be divided into many smaller, less complex problems, and solved in parallel by the multiple computers in the distributed system.
Graph processing is a type of problem that can be solved using distributed systems. In graph processing, a computing problem is represented by a graph having a set of vertices connected by a set of edges. The graph can be used to model a real-world condition, and then the graph processing can act on the graph to analyze the modeled condition. For example, the World Wide Web can be represented as a graph where web pages are vertices and links among the pages are edges. In this example, graph processing can analyze the graph to provide information to a search engine process that ranks search results. Similarly, a social network can be represented as a graph and graph processing can analyze the graph to learn about the relationships in the social network. Graphs can also be used to model transportation routes, paths of disease outbreaks, citation relationships among published works, and similarities among different documents.
Efficient processing of large graphs in a distributed computing system is challenging. Graph processing often exhibits poor locality of memory access, very little work per vertex, and a changing degree of parallelism over the course of execution. Distribution over many computers exacerbates the locality issue, and increases the probability that a computer will fail during computation. These challenges continue to occur and are growing in significance as graph processing is used to model more real-world conditions and the sizes of the graphs increase.