1. Field of the Invention
The present invention relates to a method and apparatus for determining the veracity of data which previously required manual input. More particularly, the present invention relates to a method that seeks to determine the veracity of data by comparing data read by an OCR (Optical Character Reader) keyed into a system by an operator.
2. Related Art
Traditionally, optical character recognition has been used to read and process large amounts of data, such as that collected in a census. For each character image read and processed, the OCR program will generate a "classification", which is the guess as to what the character processed is, and a "confidence", which is the OCR's evaluation of how likely the data has been correctly read. It has been normal practice to retype low confidence data. Such re-keying of data is performed by workers at manual keying workstations, where the image is redisplayed for the operator who presses the appropriate character key. Typically low-confidence data has been discarded, but an examination of such low-confidence raw OCR data has proven that many times the OCR will get the classification correct, but at a low confidence level.
The accepted industry method to measure the accuracy of the OCR software is to use known test data and process it through a system or to use a set of "trusted keyers", i.e., those people proven to be reliable and accurate, although not 100%, in the entry of data. This typically requires that the best keyboard operators be designated to re-inputting data rather than performing "real" work. Thus, it is desirable to maximize the speed of processing the incoming data and not waste the time of the best keyboard operators.
U.S. Pat. No. 5,282,267 describes a basic OCR system having an operator correction system. A dictionary is made available to the operator to look up correct data while errors are introduced to provide incorrect data to measure operator efficiency feedback. The described method of quality assurance is different from the present invention. The present invention seeks to overcome the time and cost inefficiencies of the previously known art.