With the advance of computerized data processing it has become common practice to generate information in electronic form via the automated recognition of information from scanned documents. In other words, data acquisition is performed as follows. First documents are scanned, then predetermined fields are extracted and passed to recognition modules for translation of an image into a coded form, such as ASCII.
However, despite the fact that this recognition can now be performed for both printed and hand-written data, almost always the recognition process results in a certain number of errors and rejects.
Hence, there remains a need for manual checking and correction of recognized data. This is generally performed by the display to a human operator of the scanned images and the associated optical character recognition results. The human operators perform comparison and correction where necessary.
Conventionally, there are two approaches to this process. Either an operator is asked to review all the fields and key-in errors and rejects discovered in the OCR process, or they are asked to view and correct only those fields where at least one error or reject has been discovered.
In both cases, the process is as follows. A problematic field is displayed on the screen with the recognition results being displayed in the vicinity of the image. Problematic characters are generally emphasized in some way, for instance the cursor may stop under the first problematic character or reversed video mode may be used to emphasize the problematic characters. The operator checks and if necessary corrects the emphasized characters and the process is repeated until all the problematic characters are resolved.
A major disadvantage of the prior art is that the fields are viewed by the operator in the context of the original document image. Assume the optical character recognition produces 5% of rejected characters and that each field has an average of 20 characters. In such a case about 35% of the fields will have at least one rejected character.
It follows that, in the prior art, in at least 35% of cases would the operator be required to display the field image, focus on it mentally, identify the problem and correct it. Thus prior art OCR assisted dam-entry methods improve the productivity of data entry by at most a factor of 3 over a purely manual data entry process.