Graph data is becoming ubiquitous. As known, a graph may have two or more nodes (or entities), each of which has one or more local properties. In addition, a node in a graph may have associations or relationships with one or more other nodes in the graph. The association between two nodes may be referred to as an “edge.” The edge can be directed or undirected. For two nodes connected by an edge, one node is referred to as an “adjacent node” or a “neighbor node” of the other.
In a variety of applications, it is necessary to store and query such graph data in databases. Many conventional databases such as relational databases have been successful in tabulating data. However, these conventional databases have difficulties in processing graphs since the nature of graph data is quite different from that of tabular data. For example, a graph traversal is a common and fundamental operation in graph processing. Given a source node, the graph traversal returns data of one or more nodes adjacent to the source node. In the relational databases, data is usually normalized to avoid redundancy and update anomalies. Nodes in a graph, however, are highly connected and usually present many-to-many relationships. With the data normalization, the many-to-many relationship between two nodes has to be represented using a junction table, with two columns referencing the two nodes respectively.
Such organization means that a node's local properties are separated from the graph topology. That is, the edge information has to be stored separately from the node information. As a result, for each traversal from a node to its neighbor(s), the query engine of the database has to perform additional joins to look up the junction table to obtain the topology information associated with the node, which will put negative effects on the cache locality and degrade the system performance.