In the field of automated character recognition processing, individual input pieces comprising an input stream undergo processing in order to identify characters or character strings contained within the input pieces. Characters can be alphabetic, numeric, symbolic, punctuation marks, etc., and they may be hand written or machine printed. Examples of typical input pieces can include data forms, mail envelopes, bank checks, or several other types of documents or items that have characters for recognition.
Depending on the particular type of input stream, a single character may be the subject of the recognition procedures, or several characters may be combined together into a character string that is to be recognized. The recognition process may occur using various well-known technologies. For example, with optical character recognition technology, a scanner is used to scan the light and dark areas of a character on the input piece and generate a corresponding digital representation of that character. In magnetic character recognition, a magnetic reader or sensor is used to create a digital representation of characters printed with magnetic ink.
In typical practice, character recognition processing generates result strings (strings of recognized characters) which are generally quite close to what is actually on the input piece. However, it is not unusual for character recognition processes to have uncertainty about some characters. A typical cause for error in a character recognition engine result string is poor quality or lack of clarity in the original input piece. Poor printing, sloppy handwriting, smearing, stray marks or lines, or printing atop of graphics, form background, or colored or shaded areas can all cause errors in the recognition process. One common problem is that of being unable to determine which of two or more very similar characters is correct.
Manufacturers of character recognition engines have adopted various techniques to improve character recognition results. Existing techniques, however, have significant limitations. For example, one known technique is to generate multiple character possibilities for each potentially ambiguous character being recognized. A probability or confidence indication is then assigned to each result possibility. The character with the highest confidence is then selected for the result output. While this technique can improve results in some circumstances, it is typically not helpful in situations that require distinguishing between very similar characters (such as the uppercase letter “O” and the digit “0,” or the upper letter “I,” the digit “1,” and the lowercase letter “l”). Each of these similar characters may have very similar, if not identical, confidence indications. Simply picking the highest probability character does not always result in a correct result string.
Another known technique is to obtain a result string (such as a word) through recognition processing and then validate the result string against a database of known or acceptable result strings (such as a word dictionary or other type of “look-up” dictionary) to determine whether the result is valid. While this technique provides some measure of objective validation, it is only available if there is a dictionary available. In applications for which there is no look-up dictionary or other objective reference available, other solutions must be provided to improve the accuracy of result strings.
What is needed, is a system and procedure for character recognition that generates result strings with increased accuracy in applications for which the prior techniques are either unavailable or unhelpful. The present invention fulfils this need.