The present invention relates to techniques for recognizing characters and phonemes. More specifically, the invention relates to techniques that use a set of probable character or phoneme identities to recognize an unknown input character or phoneme.
Kundu, A., and Bahl, P., "Recognition of Handwritten Script: A Hidden Markov Model Based Approach," International Conference on Acoustics, Speech, and Signal Processing, New York, April 1988, pp. 928-931, describe a letter-based word recognizer whose output is either the correctly recognized word or a small set of words which includes the correct word as one of its hypotheses. Page 929 describes a set of features for which, with sample letters, optimum symbols are generated using a vector quantizer algorithm and the unweighted Euclidean distance as the distance measure. The feature vectors of each letter are then classified as one of the symbols according to a minimum distance criterion, the nearest neighbor rule, and a probability is then determined in relation to the symbol. These symbols are then used for recognition in conjunction with a hidden Markov model.
Goshtasby, A., and Ehrich, R. W., "Contextual Word Recognition Using Probabilistic Relaxation Labeling," Pattern Recognition, Vol. 21, No. 5, 1988, pp. 455-462, describe a contextual word recognition technique that uses probabilistic relaxation labeling. As shown and described in relation to FIG. 1, a contextual word recognition system includes a character recognizer module that assigns to each input character 26 numbers showing the confidences that the character in the input has labels from a to z. The confidences are then transformed to probabilities. The output of the character recognizer is actually a sequence of sets called substitution sets, each of which contains the alternatives for a particular character with nonzero probability. All possible words would be obtained by selecting one character from each of the substitution sets, but only one of the words that can be formed from the substitution sets is the correct word. A postprocessor identifies the correct word from the sequence of substitution sets using contextual information from the language. Section 2, beginning on page 456, reviews the major postprocessing techniques. Sections 3 and 4 introduce and describe results produced by a proposed postprocessor that uses transition probabilities of characters to refine the label probabilities in a word iteratively until the probabilities converge and determine a unique word. FIG. 5 shows how the postprocessor works on an input word's similarity measures, setting low similarity measures to zero and transforming the remaining similarities to probability values on which a relaxation process is applied iteratively until the most consistent labeling is obtained.
Bokser, U.S. Pat. No. 4,773,099, describes pattern classification techniques that classify unknown input characters. During a preprocessing phase, reference data are analyzed to form "ringed clusters" for each class of input data. If the input data are characters, a set of ringed clusters is associated with each character class. These ringed clusters are formed so as to be used later during the classification of an unknown input character. As shown and described in relation to FIGS. 12-13 and the subsequent figures, the classification module produces a possibility set which is a list of characters which the unknown character might be, with associated confidences. As described beginning at col. 23, line 32, a possibility set that includes no character candidates can be sent for postprocessing to a spelling corrector module that uses contextual information to replace it with a single character candidate. A possibility set that includes more than one character candidate can be sent on to other modules, such as a subline checker and context module, so that only one character candidate remains in the possibility set after this postprocessing is complete. The confidences can be used to flag characters that were not recognized with certainty so that they can be examined by a word processing operator. The confidence values can also be used by the post processing modules to assist in choosing one of the character candidates.
Bollinger et al., U.S. Pat. No. 3,969,698, describe a cluster storage apparatus for post processing error correction in character and phoneme recognition. As shown and described in relation to FIG. 6, a cluster storage apparatus outputs groups of valid alpha words as potential candidates for the correct form of a misrecognized word. A transfer function is measured to determine the propensity for misread, and is expressed as a series of equations representing each character's probability of being confused into a false output character. As shown and described in relation to FIG. 7, the cluster storage apparatus provides a group of correct words that have some probability of having been confused with an invalid word to a regional context apparatus. The regional context apparatus executes a conditional probability analysis to determine which of the correct words most closely corresponds to the invalid word.
Kahan, S., Pavlidis, T., and Baird, H. S., "On the Recognition of Printed Characters of Any Font and Size," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-9, No. 2, March 1987, pp. 274-288, describe omnifont, variable size character recognition. Section III describes a primary classifier that recognizes a character from a structural description with a statistical Bayesian classifier that uses binary features. Feature-defining regions in a parameter space are selected by an automatic clustering algorithm, producing clusters as illustrated in FIG. 5. The clusters are pruned to about 100 distinct stroke-clusters, 30 of which are shown in FIG. 8. The output of the primary classifier is a short list of classifications, in decreasing order of estimated a posteriori probability. Section IV describes how contour analysis is also used if the result of classification falls into one of a few suspect confusion groups. Section VI describes an array of structures, each consisting of a bounding box and the first k choices of the classifier, each choice consisting of a name and a figure of merit proportional to the logarithm of the posterior probability. Layout context and linguistic context are then used to disambiguate, with the linguistic context including spelling, grammar, and punctuation rules Section VII describes spelling correction of misspelled words.
Nagy, G., "Optical Character Recognition--Theory and Practice," in Krishnaiah, P. R., and Kanal, L. N., eds., Handbook of Statistics, Vol. 2, North-Holland, 1982, pp. 621 and 633-643, presents an overview of character recognition techniques. Pages 634-639 describe several approaches to character classification and discuss the conditional probability functions P(v.vertline.a.sub.k) of observing the signal v when the class of the pattern under consideration is a.sub.k. Page 634 mentions the possibility of rejecting a character, i.e. not assigning it to any class, shown in FIG. 5 as a "reject" decision; page 634 also mentions that the optimal decision consists of selecting the class a.sub.k for which the a posteriori probability P(a.sub.i .vertline.v) is the largest, and provides Bayes' formula for computation of the a posteriori class probabilities. Pages 639-643 describe recognition techniques that use contextual information.
Bozinovic, R., and Srihari, S. N., "Knowledge-based Cursive Script Interpretation," Seventh International Conference on Pattern Recognition, Montreal, Canada, July 30-Aug. 2, 1984, Proceedings, Vol. 2, pp. 774-776, describe a knowledge-based approach to word-level off-line cursive script recognition. As shown and described in section I in relation to FIG. 1, the overall process includes presegmentation, lexicon lookup, and letter hypothesizing, and results in an ASCII word. Section III describes presegmentation, letter hypothesization, and lexical representation, with the lexicon being organized in the form of a trie.
Ingham et al., U.S. Pat. No. 3,643,215, describe a pattern recognition device in which a pattern presented for classification is initially searched and descriptors are produced, as shown and described in relation to FIGS. 1 and 3. Descriptors are used to obtain a list of feature names, and, in turn, to obtain a class name prediction. The class name is then used to predict a feature. A confidence level is varied in accordance with the success of the predictions until it exceeds an acceptance threshold, in which case the class name is provided, as shown and described in relation to FIGS. 2 and 3.
Burton, D. K., Shore, J. E., and Buck, J. T., "Isolated-Word Speech Recognition Using Multisection Vector Quantization Codebooks," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-33, No. 4, August 1985, pp. 837-849, describe an approach to isolated-word speech recognition using vector quantization (VQ). Page 837 describes a previous approach in which a VQ codebook is generated for each word in the recognition vocabulary by applying an iterative clustering technique to a training sequence containing several repetitions of the vocabulary word. The clustering process represents each vocabulary word as a set of independent spectra. The new method described in the article incorporates time-sequence information by means of a sequence of VQ codebooks referred to as multisection codebooks, described in more detail at page 839. As described at page 838, new words are classified by performing VQ and finding the multisection codebook that achieves the smallest average distortion.
Juang et al., U.S. Pat. No. 4,783,804, describe the use of Markov model speech pattern templates in speech recognition. FIGS. 4-6 show steps in the formation of Markov model templates, and FIG. 5 shows steps in separating frame feature signals into clusters.