Various automatic document processors have been marketed by various manufacturers for use for example with cheques and other kinds of forms in, for example, the finance industry, tax offices and other areas where large numbers of documents must be handled. Typically, these machines read indicia on a document using magnetic or optical techniques. For a variety of reasons the reading of the indicia may be in error. For example the document may be moving rapidly, the printing or handwriting on the document may be imperfect or the document may be damaged.
Various methods have been proposed to identify and/or correct errors in reading. For example in systems typified by U.S. Pat. No. 3,764,978 where characters are recognized both magnetically and optically, readings are rejected when the two systems indicate different symbols. This adds hardware and software to the system and can lead to a substantial number of reject readings.
Other techniques such as keying or manual verification of the amount of each item have also been proposed. This represents a substantial amount of additional manual labor.
In many cases, contextual information can be used to improve the quality of an OCR result. For example, in the case of words a dictionary lookup operation is sometimes used to ensure that a recognized word is valid.
In the case of numbers, there may be an arithmetic relation between a set of numbers which might be used both to indicate that an error has been made and to correct the error.
For example, EP-A-446633 discloses apparatus for automatic processing of cheques in which, in the event of a discrepancy between a cheque amount and a summary document, such as a deposit slip or cash letter, an expert system uses logic, stored rules, error probabilities and alternate numeric values to try to correct the errors.
Of course, the straightforward way to deal with inconsistent arithmetic, would be to try combinations of alternate values produced by an OCR process until consistency is achieved. However, this is a procedure for which the computing resources required increase exponentially with the number of figures involved and therefore it can only practically be applied to relatively small bodies of data and with a small number of combinations.