This invention relates generally to graphs, and more particularly to a mechanism for providing a high degree of abstraction of graphs.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright (copyright) 2000, Microsoft Corporation, All Rights Reserved.
A table in a relational database is used to organize pieces of information having similar attributes and meaning. An example would be an address table. Typical properties of an address are street number, street name, city, state, country and postal code. Such an address table has the ability to organize each of these properties as a column and allocate a single row for each address. A database cursor is useful because it has the ability to be positioned on a given row for the purposes of examining, modifying or deleting a particular address from the table. Given a position, such a cursor can also be used to traverse or navigate the addresses in the table, for example, in order of ascending postal code, or ascending city name.
In comparison, a graph is a more general-purpose data structure than a table, yet the same ease of use of navigating a database has not been available to graphs. In the previous example, the address table cannot be readily used to store addresses which do not share the typical structure (e.g., a P.O. Box) assumed by the address table. A graph is composed of nodes and links between those nodes. Nodes have the ability to contain a value. Links and nodes have one type. Using this common data structure, any piece of information may be represented, including the address table described above. In a graph however, the concept of a row indicating the xe2x80x9cboundariesxe2x80x9d of each address is not present. The cursor concept of having exactly one street number for each address row is no longer valid. In a graph, an address may well have zero or two or five thousand street numbers.
A graph is a set of nodes and links. Each node is an object with relations to other objects, such as pointers to other nodes, and properties, such as attributes and/or data. A tree is a special form of a graph in which there are no cycles, such as circular references. Graphs provide a generalized means of storing data in which the nodes of data are associated by links. Each node has a type and a value. However, to conventionally address a subset of a graph requires a node-by-node traversal with no abstraction of subsets of the graph. Navigation and traversal of the graph is time-consuming and keeping track of which part of the graph has been navigated is complex. Furthermore, the association of links is not very flexible and the low-degree of association limits the usefulness of the information. In a directed graph, the links have a direction, from one node to another. In a labeled graph, the links and/or the nodes bear a label that identifies them.
FIG. 1 is a block diagram illustrating a conventional exemplary graph 100. The graph includes a number of linked nodes. The links represent relationships.
More specifically, node of 110 contains no text data, but does contain a link to node 120. The relationship between the node 110 and node 120 is that node of 110 is of relationship1 to node 120, and node 110 contains a link to node 130. The relationship between the node 110 and node 130 is that node of 110 is of relationship2 to node 130. Furthermore, node 110 contains a link to node 140. The relationship between the two nodes 110 and 140 is that node of 110 has relationship3 node 140. Moreover, node 110 contains a link to node 150 in which the relationship between the two nodes is that node of 110 has relationship3 to node 150. In addition, node 110 contains a link to node 160 in which the relationship between the two nodes is that node of 110 has relationship3 to node 160. Continuing, node 110 contains a link to node 170 which the relationship between the two nodes is that node of 110 has relationship3 to node 170, and node 110 contains a link to another node 180, the relationship between the two nodes is that node of 110 has relationship3 to another node 180.
In object-oriented implementations of graph 100, each of the nodes is an object, of the class that the node is named for, such as xe2x80x9ctype1xe2x80x9d or xe2x80x9ctype2.xe2x80x9d
Continuing, node 140 is related by relatonship4 to node 185. In addition, node 150 has a xe2x80x9ctype 6xe2x80x9d node 190 and xe2x80x9ctype 3xe2x80x9d node 150 has a xe2x80x9ctype 7xe2x80x9d node 195.
In a graph, each piece of information is stored only once, in a node. A node that has a relationship with the information, will have a link to the node that contains the information. Therefore, there is no duplicated information. For example, xe2x80x9ctype1xe2x80x9d node 130 has a link to xe2x80x9ctype 5xe2x80x9d node 170, and xe2x80x9ctype1xe2x80x9d node 110 has relationship3 to link to xe2x80x9ctype 5xe2x80x9d node 170. Therefore, the xe2x80x9ctype 5xe2x80x9d node 170 information is stored in one location, node 170, and is referenced from all required nodes, which eliminates duplication of the node 170 information.
The present invention solves the problem of precisely navigating or querying graph 100 so that unwanted data that does not fit the criteria of the search is not retrieved, and that all wanted data is retrieved. The present invention enables a pattern to be identified or generated, and enable the pattern to be located within the graph 100. For example, if locating in the graph 100 all employees which have a specific first name and that which have a specific type 5 is desired, then a pattern such as the pattern in FIG. 3 is generated and used to generate a spider in FIG. 4. The spider in FIG. 4 navigates and/or queries the graph in FIG. 1.
A spider is used to reference a subset of data contained in a graph data structure. The spider may be described as having many legs, each touching exactly one node or link in the graph. Each leg has the ability to be raised to inspect what is underneath and the legs have the ability to be moved according to rules programmed into the spider. Collectively, these legs represent a position within the graph. To continue the example, to be xe2x80x9cpositionedxe2x80x9d on an address that has five thousand street numbers, the spider would have five thousand legs, each one touching one of the street number nodes in the graph. Based on the spider leg position, that complex address could then be examined, modified or deleted, just like a row indicated by a cursor in a conventional database table.
A conventional database row set cursor keeps track of the current position in a result set returned by a database query. In contrast, in the present invention, a cursor keeps track of the current position in a directed labeled graph (DLG). This is distinguished from keeping track of a position in a conventional database row set, because the graph over which a cursor operates can have an arbitrarily complex shape. Further, in the present invention, navigation can be on any of the spiders xe2x80x9clegsxe2x80x9d, whereas in a traditional cursor the navigation is limited to whole rows.
A cursor object is commonly created with a constraint graph that defines the subset of the graph over which the cursor object can traverse. A cursor object that has no constraint graph can access any schema or instance data in the spider.
In one aspect of the invention a DLG that consists of any one or all of conceptual schemas, metaschemas, and instance data is managed by generating a cursor of the directed labeled graph and navigating through the directed labeled graph using the cursor.
In another aspect of the invention a graph is managed by generating a cursor of the graph and navigating through the graph using the cursor. The cursor is associated with a constraint graph that identifies a subset of the collection. The constraint graph is created or generated by obtaining a handle to a required namespace node, which yields a namespace handle, obtaining a handle to a variable using the namespace handle, determining the identity of each of a number of nodes and links associated with the namespace handle, obtaining a handle to each of nodes and links using the namespace handle, generating a segment which includes creating a second cursor in which the variable is bound to one of the nodes and links, and storing the segment.
In yet another aspect of the invention, a data structure includes a linked list that has a number of nodes and links. The linked list is a graph that is a collection of variables bound to the nodes and links in using a restriction. The restriction is a constraint graph.
In still yet another aspect of the invention, a computerized system includes a directed labeled graph, an annotated schema runtime engine operably coupled to the directed labeled graph and a spider apparatus operably coupled to the annotated schema runtime engine, in which the spider interface includes at least one variable bound to a node or a link in the directed labeled graph, an annotated schema runtime engine, and interface to an annotated schema runtime engine cursor component.