Field
The present disclosure relates to data management. More specifically, the present disclosure relates to a method and system for efficient representation of graphs with multiple edge types.
Related Art
The exponential growth of computing power has made it possible to extract information of interest, such as shopping preferences, social media activities, medical referrals, and e-mail traffic patterns, using efficient data processing. Such data processing requirements have brought with them an increasing demand for efficient computation. As a result, equipment vendors race to build larger and faster computing devices with versatile capabilities, such as graph processing, to calculate information of interest efficiently. However, the computing capability of a computing device cannot grow infinitely. It is limited by physical space, power consumption, and design complexity, to name a few factors. Furthermore, computing devices with higher capability are usually more complex and expensive. More importantly, because an overly large and complex computing device often does not provide economy of scale, simply increasing the capability of a computing device may prove economically unviable.
One way to meet this challenge is to increase the efficiency of graph representations associated with the information of interest. For example, real-world graphs can be large and thus data compression techniques are often used to reduce their memory requirements. A (unweighted) graph can be represented by a (binary) matrix, wherein a respective element of the matrix represents an edge between two vertices (or nodes) corresponding to the row and column numbers of the matrix. However, most real-world graphs are not fully connected. As a result, the number of edges is usually considerably smaller than the number of elements in a matrix. Hence, it is often useful to represent a graph as a sparse matrix that only stores the non-zero entries.
A widely used technique is the compressed sparse row (CSR) format, which uses two one-dimensional arrays to compactly represent the list of neighbors, called the adjacency list, for all vertices of the graph. The CSR format is originally used to represent sparse matrices. CSR encodes the graph in a row-major order, since the edges are stored in a typical array in sequential rows. In an alternative scheme, called compressed sparse column (CSC) format, the edges are stored in a column-major order compactly. CSC supports efficient enumeration of the set of incoming edges to the same vertex in a graph. On the other hand, CSR supports efficient enumeration of the set of outgoing edges originating from the same vertex in the graph.
While graph compression brings many desirable features to data analysis, some issues remain unsolved in efficient representation of graphs with multiple edge types.