A variety of computer-aided applications require data search processes, such as text editors, compilers, data base search processes, and data compression processes. For example, with the Ziv-Lempel ("LZ") data compression techniques, an input data stream is compressed by means of searching a history buffer to detect if each current data string matches a data string already stored in the history buffer. If a matching data string is detected in a memory location of the history buffer, a pointer and a length indicating that memory location is stored into the output data stream, rather than the entire matched data string. If a matching data string is not detected, the current data string is stored into the history buffer. To maximize data compression performance, it is thus desirable to quickly search and detect the longest matching data string already encoded in the history buffer during data compression processing.
FIG. 1 illustrates a typical trie, a multipath digital search tree often used in the LZ family of data compression methods to identify the longest matching data string already stored in the history buffer. In FIG. 1, the longest matching data string is detected by searching downward from a root 21 through the different levels of trie 20, such as families of nodes 16-18. Root 21 is linked to an immediately lower level family of nodes, nodes 30, 31, and 32. The relationship between a higher level node, such as root 21, and an immediately lower level linked node, e.g., node 30, is commonly described as a parent-child relationship. In FIG. 1, family of nodes 16, comprising nodes 30-32, form the children of root 21. Family of nodes 17, comprising nodes 40-41, form the children of node 30. Family of nodes 18, comprising nodes 50-51, form the children of node 40. Thus, each family of nodes consists only children, or nodes, sharing a common parent node.
Tries are well known tools used in the electronic data compression field. Additional background information regarding tries may be found in texts, such as "Text Compression," by T. Bell, J. Cleary, and I. Witten, pp. 140-166, and 238-239, (Prentice-Hall, Inc., 1990).
FIG. 2 illustrates the typical trie search process implemented to detect for matching data strings in trie 20 shown in FIG. 1. Typical data search process searches for a matching string by searching sequentially through linked families of nodes 16-18 by generating a link list, comprising such as one or more sequences of pointers 60-62. Each node in such sequences of pointers comprise a pointer to direct the CPU to the next node, or memory location, to be searched in its corresponding family of children searched.
FIG. 3 illustrates a more detailed embodiment of the typical trie search process. With the typical trie search process, the sequences of pointers generated for each family of nodes in a trie are organized such that each sequence of pointers instructs the CPU to search from the earliest added node of that family, e.g., node 30 in family of nodes 16, to the most recently added node in that family, e.g., node 32.
For example, as shown in FIG. 3, root 21 comprises a pointer to the oldest node, node 30 in sequence of pointers 60. Sequence of pointers 60 controls the search of family of nodes 16 through controlling the sequence by which the CPU searches the memory locations associated with remaining nodes in that family. If a matching data character is not detected at the first node, the search process continues its search through the remaining nodes in that family in an order according to sequence of pointers 60. Thus, the resulting search according to sequence of pointers 60 starts at the oldest node, node 30, then node 31, and then the most recently added node, node 32. If the current character is not detected at all in a search of sequence 60, the current data character is stored into a new memory location. A new node (not shown) corresponding to that memory location is appended to the end of sequence of pointers 60, while node 32 is updated to point to the newest added node to sequence of pointers 60. Sequence of pointers 60 is thus correspondingly adjusted to include a new pointer to the newly added node, while all other pointers in family of pointers 60 remain unaltered.
Thus, in the typical trie search process when a matching character is detected, no change to sequence of pointers 60 is made. If a matching character is not detected, only the immediate previous most recently added node is updated with a pointer to point to the newly added node. The resulting generated typical sequence of pointers 60 thus always directs the CPU to search sequentially from the oldest added node to the most recently added in each particular family of a trie being searched.
Typically, the searching process for matching data strings is the most time consuming process of typical data searches in electronic data manipulation applications, such as data compression, compilers, and editors. In the prior art trie search methods, as the input data stream is processed, many new children will be added in sequence from the oldest added node to the most recently added node, or the youngest node. Requiring the search process to search sequentially through a family always beginning with the oldest child to the youngest child is not the most efficient method of searching a variety of data strings. There is therefore a need for a more efficient data search process which minimizes the time required to detect a matching data string in a history buffer.