1. Technical Field
The present invention relates in general to computer programs. More specifically, the present invention relates to data structures in computer systems.
2. Background Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices that may be found in many different settings. Computer systems typically include a combination of hardware (e.g., semiconductors, circuit boards, etc.) and software (e.g., computer programs). As advances in semiconductor processing and computer architecture push the performance of the computer hardware higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful that just a few years ago.
One of the fundamental issues faced by computer programmers is the selection of appropriate data structures. In many applications, the choice of the appropriate data structure is the most important decision in shaping the application. Several types of data structures are commonly used in computer programming such as, arrays, linked lists, stacks, trees, etc. Each of these data structures has certain advantages and limitations. Typically, the most important aspect of a data structure is the speed at which desired data can be located and retrieved. Naturally, different types of data structures excel at different types of searches. Often, the data structure selected for a particular application is selected because of its ability to perform a needed type of search quickly and efficiently.
Two of the most commonly used data structures are arrays and trees. An array is typically defined as a fixed number of data items that are stored contiguously and that are accessible by an index. The array data structure defines a plurality of elements, with each element contained in a portion of the storage space.
Arrays generally excel at searches that require all data fields to be examined. This type of searching, called a sequential array search, generally involves selecting a portion of data storage to be searched, analyzing all the data in that portion, and moving to the next portion until all the data in the array has been searched. Thus, sequential array searches search the elements in the array in the order in which they are in storage. Arrays excel at this type of searching because multiple contiguous elements in the array can be examined at once. Multiple contiguous elements can be searched at once because the order of the search is immaterial as all data needs to be searched. This allows the search to progress quickly until all the data has been searched. The performance of the array search is even more impressive when efforts are made to avoid fragmentation of the storage space. When the array is so maintained, large portions of the data can be examined at once and hardware optimization techniques which "look ahead" at the next block of data can further improve search time.
Tree data structures are another commonly used data structure. A tree is generally defined as a finite set of elements, called nodes, linked together from a root node to a plurality of leaf nodes (with leaf nodes generally residing at the bottom of the tree and having no children nodes). Data is stored in the nodes and can be referenced using the links from root node to leaf nodes. A binary tree is a tree in which each node except the root has one parent node and all nodes have at most two children nodes. An example of a binary tree is shown as tree 900 in FIG. 9. Tree 900 includes a plurality of nodes A, B, C, D, E, F, G, H, and I. A is the root node, with B and C being the children of A. Likewise B is the parent node of D, and H is the child node of D. Binary trees are especially useful when two-way decisions must be made at each point in a process. A "balanced" binary tree is a binary tree in which the heights (the maximum level of its leaves) of the two subtrees of every node never differ by more than one.
Searching through binary trees can be done simply and efficiently using a technique called "key searching." Each node in the binary tree is assigned a key value, with the tree arranged such that all nodes with small keys are in the left subtree of a node and all nodes with larger (or equal) key values are in the right subtree of the node. With the tree so arranged, a search for a particular key value can be preformed extremely efficiently. For example, to find a node with a given value, first compare the value to the key value at the root. If the value is equal, the current tree node contains the data being searched for and the search is over. If the value is smaller, go to the left sub-tree, if it is larger go to the right subtree. When this method is continued recursively, each comparison step shrinks the remaining number of nodes to be searched in half. This results in a key value search that is highly efficient.
An even more efficient version of the tree is called the Patricia tree. A Patricia tree has several key properties. First, in a Patricia tree a leaf node is any child node that receives an upwards links from its parent node which resides on the same or lower level as the child node. Additionally, every node in a tree is some other node's leaf or its own leaf. A Patricia tree uses key bit comparison to facilitate searching for N long keys in just N nodes, while requiring only one full key comparison per search. In particular, in a Patricia search only one bit in the searched key is examined at each node, if the bit is 1 the search goes right, if the bit is a 0, the search goes left. This is continued until an upwards link is encountered. The upwards link points back to a leaf node whose tree key will match the one being searched for if the search is successful. If the tree key does not match, the search is unsuccessful. Thus, in a Patricia tree one full key comparison is required to determine if the search is successful or not. This process results in a very fast and efficient search with only one full key comparison being required.
Trees are thus known to provide the ability to perform very fast and efficient key searches. Thus, when a very fast search is needed a tree data structure is set up with the appropriate keys to facilitate the desired search. There are several limitations to the tree data structure however.
In particular, while trees provide for fast key searching, full data field searching through an entire tree is extremely inefficient. This is because the linked nature of the tree causes the data search to have to search the node in memory storage, and then jump to a parent or child node, which may be stored in a completely different portion of memory storage, search there, and jump again until the entire tree is searched. Because the search must follow the pointers from memory storage portion to memory storage portion, hardware optimization routines are not as effective. Furthermore, the speed of the search is also limited because only small portions of data (one node) are grabbed each time, while an array can grab an entire contiguous memory portion to facilitate reading ahead. Thus, while the tree can provide very fast searching for a particular key, it cannot provide efficient full data searching. If fast key searching and relatively fast full data searching are required both a tree data structure and an array data structure must be built and maintained, with both the tree and the array having a complete copy of all the pertinent data. This duplication of data requires an excessive use of storage space and also can lead to synchronization problems.
A second limitation exists because a different tree is required for each type of key to be searched. A tree designed and built to search under a dollar amount key cannot use a key search to find data based on dates. Thus, if two types of key searches are required two tree data structures must be implemented, with each data structure having an entire copy of all the data. Again, this duplication requires the excessive use of storage space and also can lead to synchronization problems.
Thus, without an improved mechanism storing data, the efficient storage and retrieval of data will continue to be hampered.