The amount of digital data stored in the world is considered to be around 4.4 zettabytes now and is expected to reach 44 zettabytes before the year 2020. As data volumes are increasing exponentially, more information is connected to form large graphs that are used in many application domains, such as online retail, social applications, and bioinformatics. Meanwhile, the increasing size and complexity of the graph data brings more challenges for the development and optimization of graph processing systems.
Various big data/cloud platforms are available to satisfy users' needs across a range of fields. To guarantee the quality of different services while lowering maintenance and energy cost, data centers deploy a diverse collection of compute nodes ranging from powerful enterprise servers to networks of off-the-shelf commodity parts. Besides requirements on service quality, cost and energy consumption, data centers are continuously upgrading their hardware in a rotating manner for high service availability. These trends lead to the modern data centers being populated with heterogeneous computing resources. For instance, low-cost ARM®-based servers are increasingly added to existing x86-based server farms to leverage the low energy consumption.
Despite these trends, most cloud computing and graph processing frameworks, like Hadoop®, and PowerGraph, are designed under the assumption that all computing units in the cluster are homogeneous. Since “large” and “tiny” machines coexist in heterogeneous clusters, uniform graph/data partitioning leads to imbalanced loads for the cluster. When given the same amount of data and application, the “tiny” machines in the cluster can severely slow down the overall performance whenever dependencies or the need of synchronization exists. Such performance degradation has been previously observed. Heterogeneity-aware task scheduling and both dynamic and static load balancing have been proposed to alleviate this performance degradation. Dynamic load balancing is designed to avoid the negative impact of insufficient graph/data partitioning information in the initial stage, where heterogeneity-aware task scheduling can be applied non-invasively on top of load balancing schemes.
Ideally, an optimal load balancing/graph partitioning should correctly distribute the graph data according to each machine's computational capability in the cluster, such that heterogeneous machines can reach the synchronization barrier at the same time. State-of-the-art online graph partitioning work estimates the graph processing speed of different machines solely based on hardware configurations (number of hardware computing slots/threads). However, such estimates cannot capture a machine's graph processing capability correctly. Different applications and machines scale differently with increasing computational ability. Furthermore, there exists a diversity of graph applications.
In order to capture the computing capabilities of heterogeneous machines accurately, profiling is often the most effective methodology. However, computation demands also depend on applications and input graphs. It is difficult to subsample from a natural graph to capture its underlying characteristics, as vertices and edges are not evenly distributed in it. Again, this may lead to inaccurate modeling of machines' graph processing capability.
Hence, there is currently not a means for accurately modeling the machines' graph processing capability in heterogeneous clusters, and, as a result, there is currently not a means for optimally load balancing/graph partitioning so as to appropriately distribute the graph data according to each machine's computational capability in the cluster.