Graph analysis is a form of data analytics where the underlying dataset is represented as a graph. Graph databases are rapidly emerging to support graph analysis.
In order to process huge data sets that do not fit within the memory of a single computer, academia and industry use distributed graph processing systems. In these systems, graph data is partitioned over multiple computers of a cluster, and the computation is performed in a distributed manner. Several distributed systems for large graph analysis have been developed that emphasize scalability.
However, the performance of these systems remains suboptimal due to an inability to optimize computation and communication patterns that are typical of graph applications.
Because distributed graph analysis typically entails copious communication, a key challenge in architecting such a system is determining how to schedule remote data access efficiently. For example, existing solutions may rigidly and sub-optimally segregate computational threads from communication threads.
A distributed system may succumb to backpressure, priority inversion, starvation, and other inefficiencies to which distributed processing is prone. For example, synchronization checkpoints and other coordination overhead may cause idling.
Furthermore, each graph analysis performs different and varying amounts of computation and communication. This may thwart a-priori attempts to statically balance computation and communication on any particular computer of the system.
Likewise, optimal balance may be elusive when more than one analysis simultaneously occurs. Traditional approaches to multiple graph analyses have not implemented multitenant architecture.
Instead, such approaches duplicate infrastructure for each additional analysis. However, duplicate instantiations have direct costs, such as increased memory usage and redundant processing, and indirect costs such as missed opportunities for multitenant tuning.