This invention relates generally to image recognition and, more particularly, to optical character recognition.
In practical optical character recognition (OCR) systems, each recognition result (e.g., for the interpretation of a field) is assigned a score representing the xe2x80x9cgoodnessxe2x80x9d of the result; often these scores are intended to reflect the probability that the recognition result is correct. The scores are used to determine which results are to be reviewed by a human and which are accepted without review.
A typical approach computes a field recognition score as the product of recognition scores for all the individual characters of the field (as if they were independent probabilities). Traditional recognition techniques for numerical entities, such as zip codes, serial numbers, and account numbers, treat each digit of the number as equally importantxe2x80x94an error in the leftmost digit of the number is as significant as an error in the rightmost digit.
As noted above, the application of traditional recognition techniques implies that errors in, e.g., a dollar (or higher order digits) are roughly equally likely as errors in cents digits. As such, a possible error in the hundred dollars digit of a financial amount is no more likely to receive human review (at additional economic cost) than a similar possible error in the one cent digit. However, we have observed that in the recognition of economic amounts an error in the leftmost digit is xe2x80x9cworthxe2x80x9d much more than an error in the rightmost digit. Therefore, and in accordance with the inventive concept, we describe a technique for producing field recognition scores as a function of the importance of a position in the image.
In an embodiment of the invention, an OCR system for scanning financial amounts is adapted to produce field recognition scores that reflect economic value. In particular, a value-based score for a recognition result is determined from both the individual character recognition scores and a xe2x80x9cvaluexe2x80x9d that is associated with each character position of the recognition result. As used herein, this xe2x80x9cvaluexe2x80x9d is referred to as an xe2x80x9cerror value.xe2x80x9d In general, the value-based score is computed by adding the error value for each character position of the recognition result. Such an approach is potentially of significant economic value to banks, remittance houses, and similar companies for which the costs associated with correction of error are correlated with the value of the error (e.g. customers are less likely to dispute cent errors than dollar errors).