Many forms, such as census forms, require an individual to answer questions by hand printing a response on the form. Most responses include words that are chosen from a list that is either implicitly or explicitly defined for the person filling out the form. For example, an implicitly defined list includes the list of Indian tribes that appears on a United State Government census form. An explicitly defined list includes, for example, lists of various diseases that are stated on an insurance application form.
Traditionally, the response is entered into a computer manually using a keyboard. Optical character recognition (OCR) systems automatically convert these hand-printed responses into a computer-readable format. Identifying hand-printed words read by the OCR systems may be difficult because there may be a number of spelling errors in the words. Spelling errors include errors made by the persons filling out the forms (e.g., insertion, deletion, substitution, and transposition of letters), as well as the character recognition errors of the OCR system (e.g., letter substitution and segmentation errors). At the present time, most errors in using state of the art OCR techniques are attributable to OCR recognition errors and not human errors in hand-printed responses.
For each application of the OCR techniques, there is a maximum tolerable error rate corresponding to the number of words which are either unidentifiable or incorrectly identified. If this maximum tolerable error rate is exceeded, then OCR cannot replace manual keyboard entry. Currently, state of the art OCR techniques have an error rate which is too high for most applications. Thus, there is a need to develop better OCR methods or to develop word identification methods that are more tolerant of the various types of errors encountered in the OCR of hand-written forms.
Several methods for identifying words in OCR of hand-written forms are disclosed in U.S. patent application Ser. No. 07/911,698, now U.S. Pat. No. 5,329,598, filed Jul. 10, for "METHOD AND APPARATUS FOR ANALYZING CHARACTER STRINGS", incorporated herein by reference. This patent application discloses a general purpose parallel computer for implementing methods for analyzing character strings. However, there is still a need for a simple, low cost architecture that is optimized for a particular character string analysis method.