1. Field of the Invention
The present invention is related to the field of data structures stored in a memory of a computer system. More specifically, the present invention is related to a method for efficiently storing a key of tables in a memory of a computer system through the use of an improved radix search tree.
2. Description of the Related Art
There are numerous prior art methods for searching for data in a data structure stored in a memory of a computer system to find a particular item of information. Certainly, it is appropriate to implement methods for organizing and searching for data in the data structure in a way that reduces the amount of memory required to store the data and perform the search in a more efficient manner.
Before discussing the prior art methods, a brief mention of terms commonly used in the description of data structures and search techniques performed thereon is in order.
A table or a file is a group of data elements, each of which may be called an entry or a record in the table. Generally, a key is associated with each record. The key is used to differentiate among different records. The key associated with a particular record may or may not need to be unique, depending on the search method utilized in accessing the table. Furthermore, the key may or may not be embedded within the record itself.
A search method accepts a key value as input and attempts to locate a record within a table stored in the memory of a computer system whose associated key is the key value. The search method may return a record, or a pointer to the record. The contents of the record may be data, program code, or a pointer to either data or program code. If the search of a table is unsuccessful in finding the key, then there is no record in the table associated with the key value. Typically, if the search is unsuccessful, an insertion is performed to add a new record with the key value as its key.
A table is stored in a data structure in the memory or an external storage, e.g., magnetic disk, of a computer system. The form of the data structure may be an array of records, a tree, a linked list, etc. Certain search methods are generally more applicable to one form and location of a data structure than another. Thus, the data structure in which a table is stored is, in part, selected according to the search method to be used to access information within the table. The present invention is related to search operations on a file or table that is organized as a tree structure.
A prior art search method utilizes a tree to facilitate searching a table stored in the memory of a computer system. The prior art search method forms a tree based on symbols of which the keys are comprised. This is generally referred to as a radix search tree. For example, if the key is comprised of the hexadecimal characters 0 through F, each successive hexadecimal digit position in the key determines 1 of 16 possible sons of a given node in the tree.
A table 100 comprising a set of keys is illustrated in FIG. 1A. For purposes of example, the set of keys in the table are comprised of from two to four hexadecimal digits. However, it is understood by those of ordinary skill in the related arts that the keys could conceivably be of any length, or all the same length. Moreover, the table typically has substantially more keys than presented in this example.
The tree illustrated in FIG. 1B, referred to generally as 111, represents a radix search tree. The tree 111 organizes the set of keys listed in the table of keys illustrated in FIG. 1A to facilitate the radix search method. Taking, for example, the first key in the table at 101, i.e., key 14(h), a root node 110 in the tree 111 points to a son node 120 at which is stored the first hexadecimal symbol 1 in the key. Node 120, in turn, points to the hexadecimal symbol 4 at node 121. Since the hexadecimal symbol 4 is the last symbol in the key 14(h), the node 121 points to a son node 150 indicating the end of a key (eok) has been reached. Node 120 also points to another son node 122, in which is stored the hexadecimal value 6 corresponding with the second symbol value in the keys 160(h) at location 102, 16E(h) at location 103, and 16E9(h) at location 104 in the table 100. Node 122, in turn, points to son nodes 123 and 124. The symbol stored in node 123 corresponds to the third symbol having a value of 0(h) in key 160(h) at location 102 in table 100. The symbol value E(h) stored in node 124 corresponds to the third symbol in keys 16E(h) and 16E9(h) at locations 103 and 104, respectively, in table 100. Finally, node 124 points to son node 125, in which is stored the last symbol value of 9(h) in the key 16E9(h) at location 104 in table 100. The end of the string of symbols 1(h), 6(h) and 0(h) representing the key 160(h) is indicated by the end of key (eok) stored in node 151, which is the son of node 123. The string of symbols representing the key 16E9(h) is likewise terminated by an eok at node 152. The other subtrees illustrated at 130 and 140 are organized in a similar fashion to the subtree illustrated at 120.
As mentioned above, the leaf nodes in FIG. 1B, e.g., nodes 150, 152, 153, etc., represent the end of a key. The leaf nodes generally contain a pointer to a record or entry that is being stored in the memory of a computer system. However, it is conceivable that the leaf node may store the pointer to a program code segment. A software program controlling the computer system to perform the search could then cause the computer system to jump to the program code segment pointed to by the pointer for further program execution.
Note that while the keys illustrated in FIG. 1A consist of the 16 hexadecimal characters O-F, the keys could also be represented by some other set of characters. For example, if the keys consist of the English language alphabetic characters A-Z, each letter of the alphabet determines a branch in a tree. In other words, each node in the tree 111 can contain m pointers, corresponding to m possible symbol values in each position of the key. Thus, if the keys were alphabetic, there would be 26 pointers in each node, each pointing to a son node, where each son node corresponds to one of 26 possible symbol values.
FIG. 2A illustrates the partial memory layout 200 for the nodes in a radix search tree data structure in which is stored the table 100 in a memory of a computer system. Since the keys in table 100 are hexadecimal, there are 16 memory locations required for each node in order to provide 16 pointers to 16 possible son nodes in the tree 111. For example, symbol value 1(h) in node 201 can be followed by any one of 16 hexadecimal symbol values, each represented by a different son node. Thus, 16 memory locations are reserved at memory block 210 for pointers to the 16 possible different son nodes.
Symbol value 2(h) in node 201 represents a different branch in the tree, and can also be followed by any one of 16 different hexadecimal symbol values. Hence, 16 memory locations are allocated at memory block 240 for storing a pointer to a potential son node which, in turn, stores a symbol following the symbol value 2(h). As illustrated in FIG. 2, three of the memory locations at memory block 240 contain pointers to the next symbols in the keys 2A(h), 214(h) and 2BF5(h) in table 100.
Moreover, in addition to the 16 memory locations reserved for pointers at each node in the tree, there may be an extra pointer corresponding to an end of key or a flag with each pointer indicating that the pointer points to a recorder program code segment rather than another node in the tree.
Of particular importance is the fact that a pointer in a node is associated with a particular symbol value based on the location of the pointer, i.e., based on the offset, or location, of the pointer relative to the first pointer at the first memory location in the node. In other words, the first pointer corresponds to the first possible symbol value, in this case, 0(h), while the second pointer corresponds to the second possible symbol value, i.e., 1(h), etc. Thus, it is unnecessary to store the actual symbol values in the nodes of the tree. Rather, only a pointer to a son node corresponding to the symbol value is required. From the location of the pointer, it can be determined the symbol value corresponding to the son node pointed at by the pointer. However, if the symbol values are not stored in the nodes of a tree, it is paramount that a memory location for each possible pointer, whether or not that pointer is ever utilized, be reserved in each node.
Given this requirement, it is clear that when the set of keys in a table is sparse, as in the case of the set of keys in table 100, the prior art method of storing a table of keys in a tree for later radix searching wastes a large amount of memory space. What is needed, therefore, is a method for storing information in a tree structure in the memory of a computer system and for subsequently searching the tree such that the amount of memory required to store a sparse table of keys is minimized.
Moreover, what is needed is an apparatus for carrying out the method for searching the tree in the memory of a computer system in such a way that the method operates in a fast, efficient manner.