Field
The present disclosure relates to graph compression. More specifically, this disclosure relates to a method and system for encoding graphs by finding and removing large cliques in order to reduce computational time and storage requirements.
Related Art
Graphs are representations of edges, also known as links or connections, that connect a set of vertices, also known as nodes. Graphs are important for many applications, including in the analysis of large data sets such as social networks or consumer-product relationships, and in biology and computer science. Many graph-computation methods exist, including for predicting relationships and making recommendations.
Because of the importance of graphs, numerous methods exist to represent and store graphs in computerized storage, such as in disk or memory. A number of methods also exist to compress representations of graphs. For example, the Edge-based Compressed Sparse Column (ECSC) format makes use of a compressed representation of a matrix storing the edges of a graph. The ECSC format is based on the Compressed Sparse Column (CSC) format for storing a sparse matrix, also known as Compressed Column Storage (CCS). These formats take advantage of matrix sparsity by encoding non-zero elements, while avoiding storing zeros.
Such techniques for compressing graphs have become increasingly important, yet graph compression remains a fundamentally challenging and unsolved problem. For example, parallel computing architectures such as the GPU have limited memory and are thus typically unable to handle large graphs. In addition, GPU algorithms must be designed with this limitation in mind, and in many cases, may be required to perform significantly more computations, in order to avoid using additional memory.
A number of methods have also been developed to identify cliques in a graph. A clique is a fully-connected subgraph, or subset of the vertices, of a graph. Clique-finding methods include greedy algorithms, branch-and-bound algorithms, and dynamic programming. In general, clique-finding is known to be a difficult problem that is hard to approximate, and therefore may take considerable computational time.