A graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. The key to graph databases is that edges represent relationships that directly relates data items in the store, as opposed to conventional relational databases, where links between data are based on the data itself, and related items are gathered by searching for this data within the store. Graph databases are designed to allow simple and rapid retrieval of complex hierarchical structures without requiring complex queries. Graph database systems use the fact that meaningful patterns emerge when examining the connections and interconnections (edges) of nodes and their pertinent information (properties). Graph databases are used in a wide variety of data processing applications, such as social network analysis, communications, path finding, and in computer network applications to analyze characteristics such as mapping dependencies and the like. In this application, networks of computers and hardware can be modeled as graphs to find components with many dependents that may be potential weak points or vulnerabilities. Other dependency networks, for example corporate or investment structures can be mapped in a similar manner.
Graph databases are powerful tools for modeling relationships between related objects. In one application, graphs are used to model computer and storage networks in large-scale enterprise computer systems. Subgraphs, which are a connected subset of a graph are convenient for grouping items by relationship or type. FIG. 1A illustrates example subgraphs of a graph database and an associated subgraph. Each subgraph 102 and 104 stores data in their respective elements, where records in the graph nodes (nodes are also referred to as vertices) are connected through typed, directed arcs, called edges that represent relationships. Each node and relationship can have named attributes referred to as properties, and a label is a name that organizes nodes into groups. For the subgraphs, edge membership is by induction, that is, the edges connecting the nodes in each subgraph are also included in the subgraph definition.
In present graph databases, subgraphs are most often defined using a static set of objects, and present graph databases generally do not store subgraphs. As the graph changes due to changes in the underlying objects (e.g., network), these static definitions become invalid. FIG. 1B illustrates an example subgraph in a directional graph (edges have direction) that goes through several iterations, and a process that induces edges in the subgraph. As shown in FIG. 1B, a subgraph is defined with respect to the connectivity to the lead node (vertex) 1, such that all vertices go “out” from this node. Example changes to the subgraph are shown through panels 105, 107, and 109. As shown in this example, subgraph 105 becomes subgraph 107 through the addition of node 4, and then it becomes subgraph 109 through the deletion of node 6. Table 110 of FIG. 1B shows how edges are induced from their vertices to be included in the subgraph 109. The present traditional method is thus to explicitly define the vertices of a subgraph and induce the edges in the subgraph. The major issue with this approach is that if vertices are added to the graph which conceptually should be in the subgraph, they must be added manually or “by hand”. Furthermore, there are also typically subgraphs of interest where edges cannot be induced, such as where there are vertices two or more edges away from another vertex in the subgraph. Present methods of subgraph processing for graph databases do not easily or comprehensibly accommodate dynamically changing subgraphs.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, Data Domain Restorer, ExtremeIO, and Isilon are trademarks of EMC Corporation. VMAX is a trademark of VMware Corporation.