A. Field of the Invention
This invention relates generally to data storage and retrieval, and more particularly to data structures for storing and retrieving dynamic hierarchical strings and portions thereof.
B. Copyright Notice/Permission
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright. COPYRGT.2000–001, Unisys Corporation.
C. Description of the Related Art
A general tree is a data structure used to store and representing data stored and accessed by a computer program. Initially, a few definitions and concepts are in order:                Node—an element of a tree;        Root Node—a single node at level 0;        Child Node—a node at level 1 or greater descending from a parent node;        Parent Node—a node associated with a lower level node. Every child node has exactly 1 parent;        Sibling Nodes—nodes that are at the same level and share the same parent node;        Leaf Node—a node with 0 children;        Empty Tree—a tree consisting of 0 nodes;        General Tree—a set of nodes consisting of an empty tree or a root node and 0 or more subsets each of which is a tree;        Abstract Data Type Tree—general tree;        Forest—a set of 0 or more trees;        Binary Tree—a special type of general tree that consists of an empty tree or a root node and exactly 2 subsets each of which is a tree;        N-ary Tree—a special type of general tree that consists of an empty tree or a root node and exactly N subsets each of which is a tree;        Lineage—the path of nodes current node up to the root node (i.e., the node, its parent, its parent's parent, etc., up to the root node);        Level—the level of the root node is 0; the level of any other node is 1+the level of its parent (i.e., the level is the number of nodes in the lineage−1);        Traverse—an orderly way to visit each node in the tree exactly once in order to perform some operation on that node;        Preorder Traversal of a general tree—starting at the root node, 1) visit node, and 2) traverse each of the children subtrees;        Postorder Traversal of a general tree—starting at the root node, 1) traverse each of the children subtrees; and 2) visit node;        Preorder Traversal of a binary tree—starting at the root node, 1) visit node; 2) traverse left subtree, and 3) traverse right subtree;        Inorder Traversal of a binary tree—starting at the root node, 1) traverse left subtree, 2) visit node, and 3) traverse right subtree;        Postorder Traversal of a binary tree—starting at the root node, 1) traverse left subtree, 2) traverse right subtree, and 3) visit node.        
A tree is a fundamentally hierarchical structure. As such, a tree may be used to represent any model that exhibits hierarchy, including but not limited to: process family structure; disk file directory structure; process priority scheduling queues; genealogical trees including family relationships among individuals, tribes, languages, and the like; classification systems including the Dewey decimal system, taxonomic classification of plants and animals, and the like; program structure including main program, procedures, nested procedures, and the like; and breakdown of a manufactured product or service.
FIGS. 1–3 will be used to illustrate an exemplary general tree, traversal of a general tree, implementation of a general tree as a binary tree using a prior art general data structure, and traversal of a binary tree using the prior art general data structure. Referring to FIG. 1, an exemplary general tree 100 is depicted. General tree 100 is comprised of root node A 101, children nodes B, C, D 102–04, grandchildren nodes E, F, G, H, I, J 105–10, and great-grandchild node K 111. Root node A is at level 0, children nodes B, C, D 102–4 are at level 1, grandchildren nodes E, F, G, H, I, J 105–10 are at level 2, and great-grandchild node K 111 is at level 3. A preorder traversal of general tree 100 would comprise visiting the foregoing nodes as follows: A 101, B 102, E 105, F 106, C 103, D 104, G 107, H 108, 1109, K 111, J 110. A postorder traversal of general tree 100 would comprise visiting the foregoing nodes as follows: E 105, F 106, B 102, C 103, G 107, H 108, K 111, I 109, J 110, D 104, A 101.
General tree 100 may be used to represent a directory structure where a file exists for each leaf node E 105, F 106, C 103, G 107, H 108, K 111, and J 110. Preorder traversal of general tree 100 to retrieve all file names would yield: A\B\E; A\B\F; A\C; A\D\G; A\D\H; A\D\I\K; and A\D\J.
A binary tree can be used to represent a general tree. A binary tree is easily represented in programming languages as a uniform structure consisting of a plurality of general data structures, each general data structure comprising a node in the tree. FIG. 2 illustrates one such prior art general data structure 200 consisting of a data value field 201, a pointer reference to the first child node (the next generation) 202, and a pointer reference to the next sibling node (the current generation) 203.
FIG. 3 illustrates a binary tree 300 representation of the general tree 100 utilizing general data structure 200. Binary tree 300 is comprised of root node A 301, children nodes B, C, D 302–04, grandchildren nodes E, F, G, H, I, J 305–10, and great-grandchild node K 311. Nodes A–K 301–11 of binary tree 300 correspond directly to nodes A–K 101–11 of general tree 100. A preorder traversal of binary tree 100 would comprise visiting the foregoing nodes as follows: A 301, B 302, E 305, F 306, C 303, D 304, G 307, H 308, 1309, K 311, J 310. A postorder traversal of binary tree 300 would comprise visiting the foregoing nodes as follows: F 306, E 305, K 311, J 310, 1309, H 308, G 307, D 304, C 303, B 302, A 301. An inorder traversal of binary tree 300 would comprise visiting the foregoing nodes as follows: E 305, F 306, B 302, C 303, G 307, H 308, K 311, I 309, J 310, D 304, A 301.
Note that when comparing tree traversal for general tree 100 versus traversal of binary tree 300, which represents general tree 100, the following characteristics can be observed: preorder traversal of a general tree is the same as preorder traversal of the binary tree that represents the general tree; postorder traversal of a general tree is the same as inorder traversal of the binary tree that represents the general tree; and there is no traversal of a general tree that corresponds to postorder traversal of the binary tree that represents the general tree.
It is known in the art of computer programming to traverse tree structures as described above, for example to access or update data stored in the tree, or to add or delete tree nodes at some location in the tree. For example, in a multi-processing computer system, such as a mainframe, one process running in the system may build a tree containing data related to all other processes running concurrently in the system. Another process may need to traverse that tree to display the status of all or some of the running processes. For large trees however, the data structures and algorithms used to build, maintain, and traverse trees today often do not scale well. This lack of scalability can be a major disadvantage as computer systems grow in size and complexity, and as more workload is imposed on the systems. In extreme cases, the processor time required to traverse the tree can exceed the time required to process the data extracted from the tree. Considering the above two-process example in the context of a large process tree, the traversal/status process itself may require a dedicated processor, thereby unnecessarily increasing the cost and complexity of the overall system.
As a specific instance of the general example discussed above, consider the scalability of the data structures and algorithms used to build, maintain, and traverse trees in the A-Series mainframe computer systems, available from Unisys Corporation, Blue Bell, Pa. When a user establishes a session or starts a job on the A-Series, this process spawns children processes. The children in turn may spawn further processes. The entire set of processes associated with a session or a job is known as a process family and the A-Series operating system organizes them in a parent-child hierarchy. The data structure used by the A-Series operating system to organize process families in this parent-child structure is a binary representation of a general tree. The following Table 1 is an exemplary output for the J operator command for a single process family:
TABLE 1Mix-PriJOB ENTRIES59641 50Lib *SYSTEM/JAVASERVLETLIB ON JAVATEST59646 50SERVLET/API/PROCESS/NEW/REQUESTS59645 50*OBJECT/JAVA ON JAVATEST59661 50P59645/9/1/“TimerThread”59660 50P59645/8/1/“Worker2”59659 50P59645/7/1/AWAITING—REUSE59657 50P59645/6/1/“Thread-0”59655 50P59645/5/1/“SessionMgrThread”59651 50P59645/4/1/“Finalizer”59650 50P59645/3/1/“Reference Handler”59649 50P59645/2/1/“Signal dispatcher”
The exemplary J operator output shows that the library job process, *SYSTEM/JAVASERVLETLIB ON JAVATEST, spawned 2 child processes—SERVLET/API/PROCESS/NEW/REQUESTS and *OBJECT/JAVA ON JAVATEST. The first child process had no offspring while the second child process spawned 8 child processes (grandchildren of the library job process *SYSTEM/JAVASERVLETLIB ON JAVATEST).
On early A-Series systems, the maximum number of active processes contending for processor resources is 4095. With the development of the Clearpath® NX5820 and NX6820 A-Series mainframe computer systems, also available from Unisys, the limit on the number of processes contending for the processor increased to 32,767. This increase brought to light an algorithmic performance problem related to the operating system CONTROLLER process monitoring the state of so many processes. Information returned by the CONTROLLER process is displayed on A-Series operator display terminals (ODTs). Affected operator commands used to display process status included J (job structure view of processes), A (active processes), W (processes waiting for operator action), S (processes scheduled for execution), DBS (database processes), and LIBS (library processes).
With an A-Series system, the CONTROLLER process is responsible for periodically updating and displaying process state system-wide. Each periodic update displays status for up to 22 processes. The next periodic update picks up where the previous update left off The A-Series CONTROLLER process did not scale for a very large number of active processes because each display of updated process status information required a status picture of the entire process family tree (a general tree) to be built. For a system that has only 50 active processes, the effort to build the status picture of the process family tree is rather insignificant when displaying status information for only 22 of these processes. However, for a system that has 32,000 active processes, the effort to build the status picture of the entire process family tree is very significant when, for example, only 22 of these items will be displayed.
Previously, when the total number of concurrently active processes was low (less than 500), the CONTROLLER process required less than 6% of the processor power of a single processor to display process state at the ODTs. When the total number of concurrently active processes increased to 8,000, the CONTROLLER process consumed an entire processor just to display process state. When the total number of concurrently active processes increased to 32,000, the CONTROLLER process consumed an entire processor to display process state and could not keep up with the requested display refresh rates. When the CONTROLLER process was asked to sort active processes by CPU rate, the CONTROLLER process could only display process state information in intervals of several minutes.
Accordingly, a clear need exists in the art for a method of traversing extremely large trees efficiently while providing the scalability demanded by continuing advances in computer systems technology.