A node in a hierarchical data structure can be addressed by traversing a data tree from a top, or root, node down through the "branches" of the data tree until reaching the target node. The path from the root node to the target node can be used to uniquely identify the node. Thus, in a data tree, such as data tree 100 shown in FIG. 1, node D 109 is uniquely defined by the string A/B/C/D, referred to as a "hierarchical value." Each node above the node D 109 is a "parent" node for node D 109, i.e., node A 101, node b 103, and node c 105, and node D 109 itself is a "child" node for each of its parent nodes 101, 102 and 103. The node values of A, B, C and D in the hierarchical value can be replaced by other identifiers that define the parent nodes of the target node.
Traditional approaches to storing hierarchical values in standard relational databases create inefficiencies in performing database queries on the hierarchical values as can be seen in referring to the tables in FIG. 2 used to store the hierarchical values for the data tree 100 of FIG. 1. Hierarchical values for nodes are stored in multiple rows 201 in a relational database table 200. Each row 201 contains a hierarchical value identifier 203 (assigned when the node is stored and often the next number in a sequence of identifiers) for a node, a parent hierarchical value identifier 205 that defines the immediate parent node, and the node value 207. Comment column 209 contains the hierarchical value defining the node in the row and is shown in the table 200 to clarify the explanation of the table 200 but is not usually stored as part of the table 200.
The structure of the table 200 provides for very efficient updates, such as inserting a new parent node, deleting a node, or renaming a node. However, the structure of the table 200 requires traversing many rows of the table 200 to find all children nodes of a particular parent node or all children nodes N levels down in the hierarchy.
For example, suppose a user wants to know all the children of node A 101. First, the hierarchical value of "A" must be converted to the corresponding hierarchical value identifier by searching table 200 to find the row 201 with a node value 207 matching "A." Once found, the hierarchical value identifier 205 for node A, or 1000 in this example, is used to search the table 200 to find all rows 201 with a parent hierarchical value identifier equal to 1000. The node values 207 of all the matching rows 201 are cached, usually in memory, for return to the user. In the current example, only child node B for parent node A is found at this level. However, the query is not completely satisfied as deeper levels of children nodes for parent node A remain on the A/B branch of the data tree 100. Therefore, the hierarchical value identifier 205 for A/B (1001) is used to find the children nodes of A/B. In the table 200, two children nodes for A/B exist, A/B/C and A/B/E. Both the A/B/C and A/B/E branches of data tree 100 must be searched in the table 200 to find even deeper children nodes of parent node A. The search of the table 200 ends when the deepest node in all branches descending from parent node A have been found. The cached values are then returned to the user as the query result.
A similar traversing of the table 200 happens when a user wants to find all the children nodes of parent node A 101 that are four levels deep in the hierarchy of the data tree 100. Counting parent node A 101 as level one, such a query will return hierarchical values A/B/C/D and A/B/E/F. However, the query process must traverse the entire table 200 as described above to return these two hierarchical values. The query process also must also keep track of its current depth within the hierarchy in order to know when it has reached the fourth level. Additional queries such as "all children greater than two and less than five levels deep" require still more logic in the query processing software.
Furthermore, determining the hierarchical value identifier for an existing hierarchical value or determining the hierarchical value represented by a hierarchical value identifier also requires a similar traversing of the table 200.
The two tables shown in FIG. 3 present a normalized version of the table 200 which eliminates the duplication of node values within the table 200 by storing the node values in a separate table 300 and assigning a node identifier 303 to each node value 305. The node identifiers 303 are then used in the table 310 instead of the actual node values 207 which are shown in the table 200. Although the normalization decreases the amount of data stored in the table 300, it does not increase the efficiency of queries run against the tables 300 and 301.
An alternate approach to speed up queries by using a relational database table with a different structure to store the hierarchical values. The table 400 in FIG. 4 stores the hierarchical value 407 for a node as a full string in each row 401 in the table 400. Each row 401 also contains a depth value 405 which represents the child node's position below the highest parent node in terms of level depth. Querying for all children nodes of a particular parent node matches the hierarchical value for the parent node against the full string 407 in each row ("prefix string match query"). With this approach it is also possible to query on children nodes of a certain depth in the hierarchy without the special processing described above. However, prefix string match queries are highly inefficient for all but the smallest tables. Moreover, the database table structure shown in FIG. 4 requires more processing for update operations, and the size of the hierarchical value 407 can easily exceed the column limit for many relational database implementations.
Therefore, the is a need for a system that can quickly process complex queries on data stores containing hierarchical values with minimal impact on the processing time required to insert new hierarchical values in the data store.