Graph analysis has been popularized as an effective method for in-depth data analysis. By modeling an underlying dataset as a graph, the model is able to capture fine-grained arbitrary relationships between data entities. By analyzing these relationships, graph analysis can provide valuable insights about the original dataset.
Due to the recent trend of increasingly large datasets of “big data”, graph analysis deals with very large graphs that do not fit within the memory of a single computer. In response to this issue, various distributed graph processing systems were created. These systems work in cluster environments in which a large graph is distributed across machines in the cluster. The computations involved in graph analysis are performed in parallel on each computer. Computers communicate over a communications network when they need access to parts of the graph held by a different computer.
Many popular graph algorithms can be expressed as multiple iterations of a computation kernel. A straightforward implementation of an analysis kernel in a distributed environment can be challenging because the above pattern requires transferring data between computers. However, some popular graph processing systems do not support these data transfers.
In a typical implementation of this data transfer, a computer sends data to a different computer when such data is requested or when the computer performs a computation that must modify data held by the different computer. Such an approach has limitations. The same data may be repeatedly transferred unnecessarily. Data is only sent after it is required for a computation or modified in the course of performing a computation on a graph. The data must be sent with information about which part of the graph it corresponds to and possibly what operation it corresponds to.
One solution to this problem that avoids the aforementioned limitations is to send data only once, in bulk, and then use that data to finish the computation. However, this presents a problem of efficiently accessing the data.