1. Field of the Invention
The present invention relates to recognizing a line of cursive text. More particularly, the invention concerns a method, apparatus, and article of manufacture employing lexicon reduction using key characters for recognizing a line of cursive text.
2. Description of the Related Art
Recognition of cursive text is a challenging problem encountered in many applications, such as postal mail sorting, bank check recognition, and automatic data entry from business forms. Typically the cursive text is handwritten words and numbers. Herein the words "cursive text" refer to cursive words, letters, numbers, and/or other symbols, which are handwritten and/or machine printed. In practice, cursive text is usually handwritten, and is only infrequently machine printed. In order to recognize cursive text, commonly an electronic representation of an image, typically a bit map, is produced. An image containing a cursive word or a line of cursive text may be referred to as a word image or as a line image. After being processed by a text recognition system, the image may be identified as a text line.
A prevalent technique for cursive word recognition is based on over-segmentation followed by dynamic programming. With this approach, a set of split points 105 on word strokes is chosen based on heuristics to divide a word (illustrated in FIG. 1A) into a sequence of graphemes 110 (primitive structures of characters), as shown in FIG. 1B. With this technique, characters, which can be letters, numbers, and/or any other symbols, may consist of one, two, or three graphemes. As can be seen in FIG. 2, graphemes 110 are typically identified by boxes drawn around portions of the cursive text.
FIG. 1B includes a segmentation graph 115, which is used for identifying the characters. Each internal node in the graph represents a split point in the word. The left-most node and right-most node indicate the word boundary. The word recognition problem can be viewed as a problem of finding the best path in a segmentation graph To accomplish this, typically a character classifier is used to assign a cost to each edge 120 in the segmentation graph. Each edge represents the segment between the two split points connected by the edge. After a cost is assigned to each edge, the dynamic programming technique is used for finding the best, and hopefully the desired, path from the left-most node to the right-most node. A sequence of characters, shown in FIG. 2, can then be obtained from the sequence of segments on the desired path. This technique outperforms techniques such as segmentation-free Hidden Markov Models using a sliding window. Over-segmentation based on Hidden Markov Models can also be implemented.
Many paths in a segmentation graph can contain segments that are good looking characters. However, the sequence of characters resulting from a given path may not form a valid word (or string) in a dictionary. Very often, the desired path does not appear in the top three paths selected by the dynamic programming algorithm. Consequently, in situations where a lexicon of limited size can be derived, a lexicon-driven matching is more desirable. A lexicon of limited size can be derived in applications such as postal address recognition, where a list of city-state names can be retrieved from a database once the zip code (zip) candidates are known. For each entry in the limited size lexicon, the dynamic programming technique is used to choose which path in the segmentation graph best matches with the entry, and a matching score is then assigned to the entry. The entry with the highest matching score is chosen as the recognition hypothesis.
Given a sequence of N graphemes and a lexicon entry string of length W, the dynamic programming technique can be used for obtaining the best grouping of the N graphemes into W segments. Typically, a dynamic table of size (N*(W+1)) is constructed to obtain the best path. With a technique that takes advantage of the oversegmentation assumption (no undersegmentation), the dynamic table size is reduced to (N-W+1)*(W+1). This technique is described by J. Mao et al. in "A System for Cursive Handwritten Address Recognition," International. Conference, on Pattern Recognition, Brisbane,. Australia, August 1998. The reduction is significant when W is large, for example, for city-state-zip matching.
Given a lexicon of L entries, the complexity of the lexicon driven matching is O(L*(N-W+1)*(W+1)). Thus, the speed of a lexicon driven system decreases linearly with the lexicon size. Recognition accuracy also decreases when the lexicon size becomes larger. Consequently, lexicon reduction is very important in a lexicon driven recognition system.
There are several known techniques for lexicon reduction. One commonly used technique is based on holistic word features. Holistic word features are, for example, word length, and the presence of ascenders, descenders, t-crossings, and i-dots. In this approach, holistic word features of the input image are matched against the holistic features of every exemplar of each of the lexicon entries. Lexicon entries which do not match well with the holistic features of the input image are discarded. Due to variations in writing styles, typically more than one exemplar must be stored or synthesized for each lexicon entry. Although lexicon reduction based on holistic word features improves the overall speed of a system by reducing the lexicon size for final recognition, its efficiency is limited by the computational overhead required for extracting holistic word features and feature matching with more than one exemplar for each lexicon entry.
An alternative approach to lexicon reduction is proposed by M. Shridar et al., in "Lexicon Driven Segmentation--Recognition Procedure for Unconstrained Handwritten Words," Proceedings of International Workshop on Frontier in Handwriting Recognition, pp. 122-131, Buffalo, New York, 1993. With this approach, the input image is first segmented into segments based on a set of heuristics, and an ASCII string is created based on the recognition results on these segments. A dynamic program is then used to match the ASCH string with each lexicon entry. Lexicon entries with high matching costs are eliminated from further consideration. A shortcoming of this approach is that performance is heavily dependent on the initial segmentation of words into characters, which is problematic for cursive words. Other shortcomings of this approach are that it requires segmenting the words twice, and that it uses unreliable segments when matching with lexicon entries.
Consequently, there is a need for a method for accomplishing lexicon reduction that decreases the time required to recognize a line of cursive text without reducing accuracy.