The ability of computer systems to process data at high speeds results partly from their ability to randomly access particular items of information within file allocation tables and databases very rapidly. Quick access to specific data is facilitated by the organization imparted to tables and databases arranged on a storage medium or in memory. In the case of a database, the organization usually includes partitioning the data into records, and each record into fields. For example, each record may correspond to a person, and within the record for that person, one field may be the person's name, and another field may be the person's address. Such organization speeds up searches for particular items of data, because when fields are located in the same part of each record, only the fields of interest, and not the entire record, may have to be searched.
A particular field associated with a group of records and singled out for searching is sometimes called a key field, although keys may be any item by which a group of records is searched or sorted. For example, if a database of names and addresses is searched by zip codes, then the zip code is the key for each record. Each key usually has some link to the record with which it is associated so that if the key is found, the associated record may be found. Key searches may be performed relatively quickly by loading the keys into the memory of a computer, organizing, and searching the keys electronically.
Keys loaded into the memory of a computer system may be searched sequentially. However, there are many schemes for finding a desired key quickly and efficiently without examining each key in sequential fashion. One scheme to streamline a key search is batching the keys in groups and using a string function to find which batch contains the correct key. The correct key is then extracted from its batch. Another search scheme is to first sort the keys, so that a key may then be found relative to the other keys based on a single quality, such as a first letter of the key, or the key's numerical value. If the number of keys to be sorted is small, any sort algorithm implemented in a short executable program is adequate. However, as the number of keys becomes large, it is important to use a faster sort technique, even if the technique is more complicated. One of the fastest and most efficient methods to sort keys is to arrange the keys in a tree, so that a search starting at a root position need only find the correct branches, sub-branches, and leaf to find the correct key. There are numerous other known sorting schemes, for example the quicksort, the bubble sort, the insertion sort, the merge sort, the selection sort, the shell sort, and the radix sort.
Radix sorting logically arranges keys as a multiway radix tree (“MRT”) so that a computer can traverse the branches of the tree, and therefore the items being searched, very quickly. The term “radix” is used to indicate the base of a numbering system, for example, binary numbers have a radix of 2, and decimal numbers have a radix of 10. The radix number usually defines the number of symbols that may be used to construct keys. In general, a radix sort arranges data by classifying each item of data based on some attribute it possesses, rather than by comparing it to other data items to find its relative place in an arrangement. For example, if sorting words, all the words beginning with the letter “a” might be put into one group, all the words beginning with “b” into another group, and so on. The members of each group could then be sorted in the same way using the second letter of each word. If the groups were arranged in a tree, then to find a key comprising the sequence “abc,” a link would be traversed from a root of the tree to the “a” branch, then from the “a” branch to the “b” sub-branch, and from the “b” sub-branch to the “c” leaf. When the “c” leaf is reached, it may point to or contain an address identifying the location of an entry associated with the key. The entry may contain anything, for example, a record number, a pointer to a subroutine in a program, a symbol, and even the key itself.
The individual symbols of a key arranged in a MRT are usually called nodes, or may be stored in memory spaces called nodes. Each key in an index file arranged as a MRT usually has several added bytes that are used as pointers, called links. Links join a node to one or more nodes, such as a group of child nodes linked to the root node, which is the starting point for traversing the tree. Other child nodes may be linked to the child nodes. A node between a child node and the root node is, in turn, a parent node to the child node.
A MRT then, is a table of keys organized as a tree, usually to index a data file, wherein the keys are sorted according to the symbols they contain. Although each node on a MRT corresponds to an individual symbol from the set of symbols used to construct the keys, each node in a MRT also logically represents the sequence of symbols embodied in its parent nodes along a path from the root node to itself. Thus, a unique path between nodes represents each key in the tree. The unique paths allow short, direct access to keys from the root node. In known MRTs, the number of nodes in a path from the root node to the last node of the key equals the number of symbols in the key.
In known MRT schemes, memory space is reserved not only for actual key entries, but also for potential key entries that could be added to the tree at a later time using numerous combinations of the symbols allowed in each scheme. In other words, known MRT schemes build more of the MRT structure than is needed to add a given key by requiring that every symbol in a key be represented at a different level. The number of wasted memory spaces in known MRT schemes can be calculated. The memory spaces allotted for one level typically equal the radix, the possible symbols “m” used to construct keys. Thus, the value of m depends on the “alphabet” of symbols selected to construct keys in a particular scheme. For example, keys using the English alphabet have twenty-six possible symbols. Numeric keys have an m value equal to the base of the numbering system used. For example, m equals 2 for binary keys, m equals 10 for decimal keys, and m equals 16 for hexadecimal keys. Since known MRTs create a tree by assigning each symbol in the key to a level containing m nodes, a key having n symbols is allotted n levels of m nodes. As an example, when the word “apples” is used as a key, a level containing twenty-six nodes would typically be allotted for each of the six symbols in the key resulting in a memory allotment of 156 nodes. Key entries containing similar symbols may use some of the empty nodes. But at least one new level of m nodes must be added to the tree whenever a key to be added to the tree possesses more symbols than any key already present in the tree. Allocating memory by assigning each symbol in a key to a unique level may result in unnecessarily large, relatively empty MRTs consisting largely of wasted memory space, awkward and slow to traverse. This is especially true when the table of keys is relatively sparse.