1. Field of the Invention
The present invention relates to methods for storing a searchable set of keywords. More specifically the invention relates to data structures for storing a set of keywords, and which permits the searching of the set of keywords.
2. Background Information
A Trie, Tree Retrieval, is a well known data structure used to store a searchable set of keywords. Tries solve many diverse and important computational problems, for example dynamic hashing for database systems, dictionary management, approximate string matching (e.g. handwriting recognition [8]) and inverted files for text retrieval to name a few. Recently, tries and their variants, Level Compression tries (LC-tries) and two-tries, have been used in routing, in particular for IP address lookup.
FIG. 1 illustrates a topical trie representing a set of keywords cityu, hkbu, hku, hkust, polyu. Each keyword is represented as a path from the root of the tree, where the edges of the path are labeled with the individual characters of the keyword. The keyword nodes are nodes where the paths from the root node to those nodes represent individual keywords in the set. Hence, all leave nodes of the trie are keyword nodes.
One major advantage of tries is their access speed, which is proportional to the length of the search string and independent of the number of keywords. Another major advantage of tries is their prefix range properties. This enables searching the set of keywords in K which have the same common prefix of an incoming keyword, efficiently, in constant time.
Due to the wide scope of applications of tries, they can be applied in many large-size (database) problems and lean applications. However, one problem with tries is that the have a very high cost of storage, i.e. they take up a lot of memory.
One specific example of the use of tries is for search engines to look up postings of query terms. If both the postings and tries are searched based on disk access, then the number of file seeks increase significantly. It would advantageous to load the tries onto main memory and only load the postings of the query terms from disks or from disk caches. However the large size of tries increases the likelihood of page faults.
With the advent of wireless communications, many mobile applications may find tries useful, for instance, word completion algorithms that assist users input text messages and to formulate queries. They can also be used for approximate string matching to support on-line handwritten character recognition for Portable Digital Assistants (PDAs), and for string searching for pocket-size electronic dictionaries and spelling checkers. Again, the problem with the use of tries in these situations is their high cost of storage.
The storage cost of tries is typically between 4 and 5 times the original storage cost of keywords contained in it. Although the price of RAM is falling, tries are still not space efficient enough to be deployed widely for large-size problems and for lean applications, in particular those operating in mobile devices, even though there are many mobile applications for them.