The present invention generally relates to large-scale graph processing, and more specifically, to data compression of large-scale graphs.
Graph processing is an analytics tool that is widely used in big-data processing. Large-scale graph processing typically involves several algorithms that perform relationship analysis of various entities, data mining, and various optimization problems. For example, modern computer processing techniques typically employ a “graph traversal” algorithm which can be applied to a variety of technical fields, including social networks, web-based applications, website user-click analysis, business analytics, and high-performance computing. The graph traversal algorithm determines one or more vertices, and the relationship of each vertex with respect to one or more neighboring vertices is typically referred to as “neighbors”. The graph is then represented as binary code or set of binary numbers. However, a graph representing a given relationship can contain tens of thousands of vertices, and each vertex can include thousands of addition neighbors. Therefore, the ability to scale out the graph traversal to very large systems can be severely limited by the capability of the controller and/or memory to processes the binary data.
Compression operations have been employed in computing systems to code and compress the binary data representing the graph. For instance, compression symbol identification can help reduce this communication and improve the execution time of graph traversal and thus improve overall processor timing and throughput. The Boldi-Vigna (BV) algorithm is a compression algorithm typically employed to compress large-scale graphs. The BV algorithm utilizes differential coding and variable length integers (VLIs) to reduce the binary values that represent the vertices and neighbors of a given graph. The VLI coding scheme can provide efficient compression, assuming that smaller binary values assigned to a particular vertex or neighbor appear more frequently in a given distribution. This assumption, however, is not necessarily true in all large-scale graph applications such as, for example, web graphs and social networking graphs. Consequently, there is a need for an improved large-scale graph compression technique.