This disclosure relates to a method for backfilling graph structure and to articles comprising the same. In particular this disclosure relates to a method for generating new graph-based data structures and adding them to existing graph-based data structures based on queries posed by users of the system over a period of time.
Graph-based data systems provide information in form of nodes and edges in a wide variety of data systems such as, for example, those used in hospitals, police and detective databases, university systems, employment databases, city service databases, and the like. Graph-based data was once thought of as a fallback option for data that could not be manipulated into a relational data system. However, graph-based data structures and graph-based data systems are now emerging as the preferred storage method, not only for overtly networked systems, such as social networks and citation networks, but also for biological systems, traffic patterns and, well, all of human knowledge.
The FIG. 1 shows how moving along the nodes and edges of a graph-based structure can be used to support a variety of different queries that users may need to execute. The term “graph-based” data structure as used herein refers to a data structure comprised of nodes and edges. Nodes represent entities such as people, businesses, accounts, or any other item one might want to keep track of Edges are the lines that connect nodes to nodes or alternatively, nodes to properties and they represent the relationship between the two. Meaningful patterns emerge when examining the connections and interconnections of nodes, properties, and edges.
For example, in the data system for a large hospital, a doctor 1102 might want to find out which other doctors his patients are seeing. By first finding himself 1102 in the data system (see step (a)), he can then pivot out to all of the patients 1104 associated with him (see step (b)), and then pivot back to all of the doctors 1106 and 1108 associated with those patients (see step (c)). A pivot is the process of selecting an initial set of seed nodes (in this case “the doctor 1102”) in the graph, and then swinging out to the neighboring nodes (in this case “all of the patients 1104”) that are connected to it. This produces subgraph data consisting of both the seed node (1102) and neighbor nodes (1104). The term “pivot” comes from the fact that this operation can be chained together, with the neighbor nodes (1104) from the previous step serving as the seed nodes in the present step to determine another set of neighbor nodes (1106 and 1108).
In the FIG. 1, the lines (shown in bold) connecting the doctor 1102 to his patients 104 are called edges. These bold lines represent a first set of edges 1105. The lines connecting the patients 1104 to their other doctors 1106 and 1108 are shown in dotted lines and these represent a second set of edges 1107. While the exemplary graph shown in the FIG. 1 is helpful to the doctor to determine which other doctors service some of his patients, not all users of the system will find the graph-based data system easy to access and even fewer will find the means (e.g., querying the system using keywords approved by the system) to access the structure and to obtain all information available to them from the structure.
It is therefore desirable to have a graph-based data structure where abstractions are developed and continuously improved based on the type and population of queries that the system is subjected to over time by users and not just by the people that create the abstractions.