1. Field of the Invention
This invention relates in general to storing hierarchical data structures in relational databases, and more particularly, to efficient indexing of hierarchical data structures in relational databases.
2. Background Art
Hierarchical data structures, such tree data structures, are commonly used to represent data. Tree data structures are made of nodes, which can contain both data to be stored and pointers to other nodes in the tree. The terms “hierarchical data structure”, “tree data structure” and “tree” are used interchangeably throughout this disclosure. Nodes in a tree structure are in an ancestor-child relationship. A root node is the topmost node that has no ancestor nodes. A tree structure also has leaf nodes and internal nodes. An internal node is a node that has one or more child nodes. A leaf node has no children. All children of the same node are siblings. Referring now FIG. 1, an exemplary tree data structure 160 is shown. In FIG. 1, node “Alpha” is a root node that has the following three children: “Bravo”, “Charlie”, and “Juliet”. Since Bravo, Charlie, and Juliet are children of the same parent, Alpha, these nodes are siblings (or ‘peers’). In the tree structure shown in FIG. 1, each node is identified by its unique ID (OID), the unique ID of its Parent_OID), and its sequential number among siblings (Peer_Sequence).
It has become commonplace to represent hierarchical data structures in relational databases. An example of a relational database representation of the tree structure illustrated in FIG. 1 is shown below in Table 1.
TABLE 1Database Representation of Tree Data Structure 160OIDParent_OIDPeer_SequenceLEVELDATA100NULL11Alpha20010012Bravo30010022Charlie40020013Delta60030013Echo50030023Foxtrot70050014Golf80050024Hotel90070015India100010032Juliet
Table 1 stores a plurality of records. Each record is associated with a node in the tree data structure and includes a plurality of fields. An exemplary record shown in Table 1 includes a DATA field that stores data associated with a particular node in the data structure, and OID, Parent_OID, Peer_Sequence, and Level fields.
The OID field stores a unique object ID of a given node. An object ID can be any value that uniquely identifies the node. In a preferred embodiment, OID is a numeric value.
The Parent_OID field stores a pointer or reference to the parent of the node. When the node is a root node, it has no parent. In one implementation, the value of Parent_OID for the root node is NULL or a null-like value, such as zero.
The Peer_Sequence field stores an index value identifying the sequential order of a node relative to its siblings. Peer_Sequence can be any value capable of being transformed into a constant-width string. In a preferred embodiment, Peer_Sequence can be any numeric value. The Peer_Sequence of the first node among siblings of the same parent can also indicate the interval by which Peer_Sequence of a subsequent sibling will be incremented. For example, if the numeric Peer_Sequence value of the first sibling is “10”, Peer_Sequence for each subsequent sibling node is incremented by “10”.
The LEVEL field stores the sequential number of the level at which the node is located in the hierarchy. A hierarchical data structure can include any number of levels. The value of LEVEL is derived from the position of the node in the hierarchy.
The DATA field stores the value of the data associated with a given node. It could actually represent a collection of fields associated with each node.
A database user often needs to examine relationships across multiple levels of the tree hierarchy or needs to identify the next node following a given node in the hierarchy. One of the largest drawbacks of a relational database is its inability to handle queries efficiently. For example, to answer a question like “Is India a descendant of Charlie?” in the tree data structure shown in FIG. 1, one would have to perform a pointer chain traversal by iteratively traversing nodes in the tree data structure using a reference to the parent node. This process involves issuing several queries or joins within one query to find out if the PARENT_OID of India is the OID of Charlie, or if the PARENT_OID of the PARENT_OID of India is the OID of Charlie, etc. until the top of the hierarchy is reached. Below are sample SQL queries that need to be executed to perform the pointer-chain traversal of a data structure.
Now = IndiaWhile (up.PARENT_OID is not null) and(up.PARENT_OID <> Charlie.OID)Do{select up.PARENT_OID from A_TREE now,A_TREE up where up.OID =now.PARENT_OID;up = now}If up.PARENT_OID is null, India is NOT adescendent of Charlie, otherwise it is.
This query is executed multiple times until a match is found or the root node in the tree structure is reached. To search more efficiently, one query that combines the previous queries can be issued. The problems with the above solution are that they either require an iterative series of queries of unknown length or a single complex and inefficient query.
Often, it is desirable to represent a tree data structure in a tree display similar to a computer directory tree structure, showing the hierarchy of the nodes, such as the one shown below:    Alpha    +Bravo    ++Delta    +Charlie    ++Echo    ++Foxtrot    +++Golf    ++++India    +++Hotel    +Juliet
In this example, the ‘+’ signs prefixing the DATA name indicate that a given node is an immediate child of the nearest preceding node in the data structure with fewer (or no) ‘+’ signs. The existing solutions for ordering a hierarchical data structure have been left for programmers since this type of ordering cannot be accomplished with a single SQL query.
Accordingly, there is a need for a mechanism that provides for more efficient traversal of hierarchical data structures and avoids the complexities of prior art techniques.