There are many applications for data input from a hard copy to a computer system that use automated Optical Character Recognition (OCR), followed by manual verification of the OCR results. Often, the computer that performs the OCR also generates a confidence rating for its reading of each character or group of characters. For example, PrimeOCR™ software (produced by Prime Recognition Inc., Woodinville, Wash.) gives each character that it recognizes a confidence rating between 1 and 9. According to the manufacturer, results with confidence levels above 6 can usually be considered accurate. Human operators perform the verification step, either by reviewing all the fields in the original document, and correcting errors and rejects discovered in the OCR results, or by viewing and correcting only the characters or fields that have a low OCR confidence level.
There are methods known in the art for improving the reliability of the verification step. For example, U.S. Pat. No. 5,455,875, to Chevion et al., whose disclosure is incorporated herein by reference, describes a method for organizing data on a computer screen so as to improve productivity of human operators in verifying OCR results. The method is implemented in document processing systems produced by IBM Corporation (Armonk, N.Y.), in which the method is referred to as “SmartKey.”
SmartKey works by presenting to the human operator a “carpet” of character images on the screen of a verification terminal. The character images are taken by segmenting the original document images that were processed by OCR. Segmented characters from multiple documents are sorted according to the codes assigned to them by the OCR. The character images are then grouped and presented in the carpet according to their assigned code. Thus, for example, the operator might be presented with a carpet of characters that the OCR has identified as representing the letter “a.” Under these conditions, it is relatively easy for the operator to visually identify OCR errors, such as a handwritten “o” that was erroneously identified as an “a.” The operator marks erroneous characters by clicking on them with a mouse, and then typically presses a “done” or “enter” button.
The displaying of composite, “carpet” images to the operator, made up entirely of characters which have been recognized by the OCR logic as being of the same type, enables errors to be rapidly recognized and marked on an exception basis. Once recognized, these errors can then be corrected either immediately or sent to another operator for correction, along with characters rejected by the OCR logic. The remaining, unmarked characters in the carpet are considered to have been verified.
U.S. Pat. No. 5,852,685, to Shepard, whose disclosure is incorporated herein by reference, similarly describes a method for enhanced batched character image processing. Character images are presented to an operator for verification in groups that are sorted to have the same recognized identities. To check the accuracy of the results, a number of images of purposely-incorrectly-identified characters (bogus errors) are included with the groups of character images to be verified. At the end of processing a batch of documents, the results of verification are examined to determine how many bogus errors were caught by the operator.