1. Field of the Invention
The invention relates generally to computer and data processing systems and more specifically to the storage of tree data structures in the memory of a computer system.
2. Description of the Prior Art
Trees are frequently used in computer systems to represent relationships between items of data. For example, a compiler may have to process a source file for a program which includes the following:
______________________________________ int a, b, product; . . . product := a * b; . . . ______________________________________
The statement product:=a*b; instructs the compiler to generate executable code which assigns the product of the variables a and b to the variable product. While the compiler is working on the source code, it represents the statement as a tree. The tree is shown in FIG. 1. Tree 101 is made up of nodes which are indicated by circles in FIG. 1. A given node may have descendant nodes which are lower in the tree than the given node and ancestor nodes which are higher. Thus, the ancestors of node 111 are nodes 107 and 103, and the descendants of node 103 are nodes 105, 107, 109, and 111. The immediate ancestor of a node is its parent, and the node's immediate descendants are its children. Children of the same parent are siblings. Thus, node 107 has node 103 as its parent, nodes 109 and 111 as its children, and node 105 as its sibling. The topmost node in the tree, in this case, node 103, is termed the root node of the tree, and nodes without children are termed leaf nodes. In tree 101, nodes 105, 109, and 111 are leaf nodes. A subtree is a node and all of its descendants. Thus, nodes 107, 109, and 111 are a subtree of tree 101 and node 109 is a subtree of both tree 101 and the subtree made up of nodes 107, 109, and 111. The subtree whose root is a given node is said to be rooted in that node. Accordingly, subtree 107, 109, and 111 is rooted in node 107.
The portion of source code represented by tree 101 includes operators, which indicate operations to be performed on values, and operands which indicate the values upon which the operations are to be performed. The operators are * indicating multiplication, and: =, indicating assignment. As for the operands, a and b are the operands of *, and the operands of: =are product and a * b. In tree 101, the subtrees representing an operator's operands are rooted in the children of the node representing the operator. Thus, node 103 representing the: =operator has as its children node 105, in which the subtree representing the left-hand operand is rooted, and node 107, in which the subtree representing the right-hand operand is rooted.
Operations which programs perform on trees include traversal, in which, beginning at the root, each node of the tree is visited in some order; locating the parent of a given node; locating a specific child of a given node; and moving to the next node to be visited in the course of a traversal. Two important types of traversals are preorder traversals and postorder traversals. In a preorder traversal of tree 101, the nodes are visited in the order 103, 105, 107, 109, 111. In a postorder traversal, the nodes are visited in the order 105, 109, 111, 107, 103. As for the other operations, the operation of locating the parent of node 111 would locate node 107; the operation of locating the second child of node 103 would locate node 107; and in the case of a preorder traversal, the operation of advancing to the next node performed on node 107 would locate node 109.
In order to facilitate the above operations, a tree is usually represented in a computer system as a collection of node data structures. FIG. 2 shows such a representation 201 of tree 101. Each node of tree 101 has a corresponding node data structure 203; each node data structure 203 contains node data 205, which represents the node's data. For instance, node data 205 in node data structure 203 representing node 103 indicates at least that the node represents the: =operator. Each node data structure 203 further has pointers 206. A pointer is value which is an address in memory. Parent pointer (PP) 207 in a given node data structure 203 has as its value the address of (points to) node data structure 203 representing the parent node of the node represented by the given node data structure 203. Similarly, each of the child pointers (CP) 209 points to a node data structure 203 representing one of the children of the given node. The order of the child pointers is the same as the left-to-right order of the children of the node represented by node data structure 203. Thus, as seen in tree representation 201, parent pointer 207 of node data structure 203 representing node 107 points to node data structure 203 representing node 103 and the child pointers 209(1) and 209(2) point to node data structures 203 representing nodes 109 and 111 respectively. As is apparent from the foregoing, all of the operations which need to be performed on tree 101 can be easily performed by following the pointers in tree representation 201.
There are two drawbacks of tree representation 201. The first is that the trees required by many applications (such as programming environments for languages such as C++) are very large. Large trees pose a number of difficulties for a memory system. In simple computer systems, the physical memory system may not be large enough to accommodate large trees; in computer systems having virtual memory, there may be no problems with physical memory, but the virtual address space may be too small; even where that is not a problem, performing operations on a large tree may result in many page faults and a corresponding degradation of performance. The second is that the pointers 206 in the nodes have meaning only in the memory system in which tree representation 201 was created; it is consequently not possible to copy tree representation 201 from one computer system to another computer system.
The prior art has attempted to deal with both drawbacks. One approach has been to reduce the size of node data 205 to a minimum. Generally speaking, the data represented by node data 205 appears over and over again in a program. For example, most programs will use operators such as: =and * repeatedly, and programs also generally use variables such as a, b, and product repeatedly. That being the case, the prior art has made an entry in a data dictionary in memory for each unique item of node data and has replaced node data 205 with the index of the entry for the item of node data in the data dictionary, thereby reducing node data structure 203 to the necessary parent and child pointers and the index into the node data structure. However, the trees in some applications have grown so large that even such reduced node data structures 203 still take up too much memory. Further, these reduced data structures 203 have grown steadily larger as the size of address spaces, and therefore of the pointers used in them, has increased.
Another approach has been to develop representations of trees which require fewer pointers. Such representations store the node data structures sequentially and use the order in which the node data structures are stored to replace some or all of the pointers. Examples of such representations may be found in D. E. Knuth, Fundamental Algorithms, Section 2.3.3, p. 347. One of the representations in this section, termed by Knuth postorder with degrees (page 350, bottom) eliminates pointers completely. Instead, the sequential order in which the node data structures are stored is that of a postorder traversal of the tree and each node data structure includes the degree of the node, that is, the number of children which the node has. Other examples of such representations may be found in R. C. Read, Graph Theory and Computing.
The usefulness of these "pointerless" representations of trees has been greatly limited by the fact that the art has not known how to do the kinds of general navigation operations previously described in such representations. For this reason, the art has used these representations only for long-term storage of trees and has reconstituted them into representations using pointers whenever navigation was required. Reconstituting the trees requires time, and of course the reconstituted representations have all of the size problems alluded to above. Consequently, the pointerless representations have been useful only to reduce the amount of disk space required for storage of trees and to make it possible to move trees from one computer system to another. It is an object of the present invention to overcome these disadvantages and provide pointerless representations of trees which can be navigated and techniques for navigating them.