A table in a relational database is used to organize pieces of information having similar attributes and meaning. An example would be an address table. Typical properties of an address are street number, street name, city, state, country and postal code. Such an address table has the ability to organize each of these properties as a column and allocate a single row for each address. A database cursor is useful because it has the ability to be positioned on a given row for the purposes of examining, modifying or deleting a particular address from the table. Given a position, such a cursor can also be used to traverse or navigate the addresses in the table, for example, in order of ascending postal code, or ascending city name.
In comparison, a graph is a more general-purpose data structure than a table, yet the same ease of use of navigating a database has not been available to graphs. In the previous example, the address table cannot be readily used to store addresses which do not share the typical structure (e.g., a P.O. Box) assumed by the address table. A graph is composed of nodes and links between those nodes. Nodes have the ability to contain a value. Links and nodes have one type. Using this common data structure, any piece of information may be represented, including the address table described above. In a graph however, the concept of a row indicating the “boundaries” of each address is not present. The cursor concept of having exactly one street number for each address row is no longer valid. In a graph, an address may well have zero or two or five thousand street numbers.
A graph is a set of nodes and links. Each node is an object with relations to other objects, such as pointers to other nodes, and properties, such as attributes and/or data. A tree is a special form of a graph in which there are no cycles, such as circular references. Graphs provide a generalized means of storing data in which the nodes of data are associated by links. Each node has a type and a value. However, to conventionally address a subset of a graph requires a node-by-node traversal with no abstraction of subsets of the graph. Navigation and traversal of the graph is time-consuming and keeping track of which part of the graph has been navigated is complex. Furthermore, the association of links is not very flexible and the low-degree of association limits the usefulness of the information. In a directed graph, the links have a direction, from one node to another. In a labeled graph, the links and/or the nodes bear a label that identifies them.
FIG. 1 is a block diagram illustrating a conventional exemplary graph 100. The graph includes a number of linked nodes. The links represent relationships.
More specifically, node of 110 contains no text data, but does contain a link to node 120. The relationship between the node 110 and node 120 is that node of 110 is of relationship1 to node 120, and node 110 contains a link to node 130. The relationship between the node 110 and node 130 is that node of 110 is of relationship2 to node 130. Furthermore, node 110 contains a link to node 140. The relationship between the two nodes 110 and 140 is that node of 110 has relationship3 node 140. Moreover, node 110 contains a link to node 150 in which the relationship between the two nodes is that node of 110 has relationship3 to node 150. In addition, node 110 contains a link to node 160 in which the relationship between the two nodes is that node of 110 has relationship3 to node 160. Continuing, node 110 contains a link to node 170 which the relationship between the two nodes is that node of 110 has relationship3 to node 170, and node 110 contains a link to another node 180, the relationship between the two nodes is that node of 110 has relationship3 to another node 180.
In object-oriented implementations of graph 100, each of the nodes is an object, of the class that the node is named for, such as “type1” or “type2.”
Continuing, node 140 is related by relationship4 to node 185. In addition, node 150 has a “type 6” node 190 and “type 3” node 150 has a “type 7” node 195.
In a graph, each piece of information is stored only once, in a node. A node that has a relationship with the information, will have a link to the node that contains the information. Therefore, there is no duplicated information. For example, “type1” node 130 has a link to “type 5” node 170, and “type1” node 110 has relationship3 to link to “type 5” node 170. Therefore, the “type 5” node 170 information is stored in one location, node 170, and is referenced from all required nodes, which eliminates duplication of the node 170 information.
The present invention solves the problem of precisely navigating or querying graph 100 so that unwanted data that does not fit the criteria of the search is not retrieved, and that all wanted data is retrieved. The present invention enables a pattern to be identified or generated, and enable the pattern to be located within the graph 100. For example, if locating in the graph 100 all employees which have a specific first name and that which have a specific type 5 is desired, then a pattern such as the pattern in FIG. 3 is generated and used to generate a spider in FIG. 4. The spider in FIG. 4 navigates and/or queries the graph in FIG. 1.