There are many applications for data input from a hard copy to a computer system that use automated Optical Character Recognition (OCR), followed by manual verification of the OCR results. Often, the computer that performs the OCR also generates a confidence rating for its reading of each character or group of characters. Human operators perform the verification step, either by reviewing all the fields in the original document, and correcting errors and rejects discovered in the OCR results, or by viewing and correcting only the characters or fields that have a low OCR confidence level.
There are methods known in the art for improving the reliability of the verification step. For example, U.S. Pat. No. 5,455,875, to Chevion et al., whose disclosure is incorporated herein by reference, describes a method for organizing data on a computer screen so as to improve productivity of human operators in verifying OCR results. The method is implemented in document processing systems produced by IBM Corporation (Armonk, N.Y.), in which the method is referred to as “SmartKey.”
SmartKey works by presenting to the human operator a “carpet” of character images on the screen of a verification terminal. The character images are taken by segmenting the original document images that were processed by OCR. Segmented characters from multiple documents are sorted according to the codes assigned to them by the OCR. The character images are then grouped and presented in the carpet according to their assigned code. Thus, for example, the operator might be presented with a carpet of characters that the OCR has identified as representing the letter “a.” Under these conditions, it is relatively easy for the operator to visually identify OCR errors, such as a handwritten “o” that was erroneously identified as an “a.” The operator marks erroneous characters by clicking on them with a mouse.
The displaying of composite, “carpet” images to the operator, made up entirely of characters which have been recognized by the OCR logic as being of the same type, enables errors to be rapidly recognized and marked on an exception basis. Once recognized, these errors can then be corrected either immediately or sent to another operator for correction, along with characters rejected by the OCR logic. The remaining, unmarked characters in the carpet are considered to have been verified.
Even in productivity-enhancing verification systems, such as SmartKey, there are still cases in which the operator may be uncertain about whether to verify a given OCR reading. This may be the case particularly in verifying hand-written characters. The operator is supposed to pass only unambiguous characters in the verification stage, while marking all erroneous or even ambiguous characters as incorrect (or at least uncertain). However, when the character is ambiguous, the operator may attempt to guess whether a certain reading is correct, thus reducing the reliability even of the verified results. There is therefore a need to improve the quality and reliability of the verification step of such data input methods.