In computing, “NoSQL” is a term used to define a broad class of database management systems that differ from traditional database management systems (e.g., relational database management systems) in a variety of ways. For example, the data stores used within NoSQL database management systems may not require fixed table schemas, may avoid join operations, and oftentimes scale horizontally. NoSQL database management systems address many of the shortcomings associated with traditional relational databases, such as, for example, poor performance on data-intensive applications (e.g., large-scale document indexing, serving pages on high-traffic websites, delivering streaming media, etc.).
One existing class of NoSQL data stores is a graph database. As known in the art, a graph database uses graph structures with vertices, edges, and properties to represent and store information. For example, mathematically, a graph may be defined as a combination of vertices (also referred to as “nodes”) and edges connecting the vertices, i.e., (Graph=<Vertice(s), Edge(s)>). Graphs serve as useful tools for representing a wide variety of real-world relationships. For example, in a graph of a social network, each person might be represented by a vertex while a friendship between two people might be represented as an edge.
FIG. 6 herein illustrates one example of a graph 600. As shown in FIG. 6, there are three vertices (each represented by a circle) and three edges (each represented by a line) connecting the vertices. Each vertex has a different vertex ID. For example, the vertex in the upper left-hand region of FIG. 6 has a vertex ID of 1 (i.e., vertex 1), the vertex in the upper right-hand region has a vertex ID of 2 (i.e., vertex 2), and the vertex in the lower region has a vertex ID of 3 (i.e., vertex 3). Each vertex may also include any number of additional properties. For example, vertex 3 is shown having the additional properties of a name (Josh) and an age (32). Thus, in this example, vertex 3 could represent a thirty-two year-old named Josh. Furthermore, each vertex may have a label. For example, vertex 2 is shown having the label “ACCOUNT,” while vertex 3 is shown having the label “PERSON.”
Similarly, each edge may have a different edge ID. For example, the edge connecting vertex 1 to vertex 3 is edge 8, the edge connecting vertex 3 to vertex 2 is edge 11, and the edge connecting vertex 2 to vertex 1 is edge 9. As with vertices, edges may also have any number of additional properties. For example, edge 9 includes the label “paid.” This could indicate, for example, that the entity (e.g., a person or organization) represented by vertex 2 paid money to the entity associated with vertex 1. Furthermore, edges can have directions or be undirected. For example, edge 11 is an undirected edge connecting vertex 3 to vertex 2. However, edge 8 is directed from vertex 1 to vertex 3. Thus, edge 8 may be referred to as an outgoing edge of vertex 1 and/or an incoming edge of vertex 3. Likewise, edge 9 may be referred to as an outgoing edge of vertex 2 and/or an incoming edge of vertex 1. Edges can also be characterized as having a source and a target. For example, vertex 1 is the source vertex for edge 8 and vertex 3 is the target vertex for edge 8. Thus, as demonstrated by the exemplary graph 600 of FIG. 6, graphs can be described by their graph data, which includes (1) vertex data describing the different vertices and the properties of those vertices and (2) edge data describing the different edges and the properties of those edges.
While graph databases are useful for storing graph data, they are limited in their ability to store and facilitate the retrieval of other types of data. Furthermore, accessing and manipulating graph data stored in a graph database can inhibit computing performance.
Another existing class of NoSQL data stores is the column-oriented data store. One existing implementation of a column-oriented data store is Apache™ Cassandra. Within Cassandra, data is logically represented as a large table (or spreadsheet). A table (i.e., a “keyspace” in Cassandra's nomenclature) may contain a plurality of rows (i.e., “keys” in Cassandra's nomenclature). Each row may also include one or more columns. Cassandra supports a flexible schema, meaning that each row may have a different number of columns. Furthermore, Cassandra incorporates the concept of a “column family.” Each column family categorizes columns for efficient data storage and access purposes. Accordingly, a piece of data can be accessed using an address consisting of a row name, a column family name, and a column name, i.e., (data address=<row name, column family name, column name>).
While graph databases are sufficient for storing graph data, they suffer from a number of drawbacks related to scalability and computing performance. Accordingly, it is desirable to provide techniques for storing graph data in a column-oriented data store in order to improve scalability and computing performance.