The present invention relates to methods of incorporating non-traditional objects in database systems and in particular to a method for employing a Trie data structure to implement a database of handwritten symbols.
The maintenance and indexing of databases having traditional objects such as letters, numbers and words is well known. U.S. Pat. No. 5,202,986 entitled PREFIX SEARCH TREE PARTIAL KEY BRANCHING, describes a specialized tree structure for database searching. The technique disclosed in this patent is based on prefixes of letters or numbers. This patent also describes a Trie based system which compares individual characters and an input word to match the input word to a word in the database. Even for the traditional objects described in this patent, the Trie based system is not preferred because it requires all possible characters for the search key to be partitioned into individual disjoint classes, where each class has a first level branch. In addition, the Trie data structure is described as containing a number of levels corresponding to the number of characters in the longest expected search key.
While it is relatively easy to index databases of traditional objects, it is more difficult to index databases of non-traditional objects such as handwritten text or symbols. These difficulties arise mainly from problems in matching similar handwritten words. It is difficult, for example, for one person to write a word the same way twice. It is even more difficult for one person to write a word in the same way that another person has written it. This inconsistency in the representation of non-traditional objects makes it difficult to match and retrieve handwritten information.
U.S. Pat. No. 5,151,950 entitled, METHOD FOR RECOGNIZING HANDWRITTEN CHARACTERS USING SHAPE AND CONTEXT ANALYSIS describes a system in which a Trie data structure is used to hold a dictionary of words that may be recognized by a handwritten character recognition system. This system includes two parts, a shape recognizer and a set of deterministic finite automata (DFA). In this application the Trie data structure is used as the DFA. At each level of the Trie data structure, the shape recognizer is passed an input string and returns a number of possible matching characters. The Trie data structure is then used to determine if any of the recognized characters is proper at this level for a sequence of characters (i.e. a word) that is stored in the database. This method, however, requires extensive operations by the shape recognizer. This component of the system must apply each letter model in the alphabet to the input string at each level of the Trie. The technique described in this patent only works for manuscript (hand-printed) text, i.e., non-cursive text. Manuscript text is more easily segmented than cursive text.
Ideally, the database of words which can be recognized should only hold one representation of a word and the system which uses the database should be able to recognize similar words without the need to store each different version. Hidden Markov models (HMMs) have been proposed as an alternative representation for handwritten words. In the HMM approach, each handwritten word in a database is represented by a statistical model, the HMM. Each HMM is trained so that it accepts the specific word with a high probability relative to other words in the database. In systems which use HMMs to recognize handwritten words, a separate HMM is stored for each word in the database. In order to recognize a given input word, each HMM in the database is executed and the one which accepts the input word with the highest probability is selected as the matching HMM. Because each HMM in the underlying handwritten database has to be tested against the input word, this system operates in a linear process where the speed of execution is a formidable obstacle. In an article by Lopresti et al. entitled,"Pictographic Naming" Interchi '93 Adjunct Proceedings, pages 77-78, 1993, a search of this type through a database of 60 words is described as taking approximately 20 seconds to execute on a NeXT Station running at 40 Mhz.