Optical Character Recognition (OCR) methods which convert text present in an image into machine-readable code are known.
U.S. Pat. No. 5,519,786 describes a method for implementing a weighted voting scheme for reading and accurately recognizing characters in a scanned image. A plurality of optical character recognition processors scan the image and read the same image characters. Each OCR processor outputs a reported character corresponding to each character read. For a particular character read, the characters reported by each OCR processor are grouped into a set of character candidates. For each character candidate, a weight is generated in accordance with a confusion matrix which stores probabilities of a particular OCR to identify characters accurately. The weights are then compared to determine which character candidate to output.
Such a method has several limitations. First, since OCR processors are run in parallel, a character that would not be recognized by any of the OCR processors taken independently cannot be recognized by the method as a whole.
Second, a preprocessor has to quantify the strengths and weaknesses of the OCR processors to generate the cells of the confusion matrix that contain probabilities that the character read by an OCR processor is the character reported by the OCR processor. This step can take time. Moreover, if this step is not well performed, for example because the training set used for it is not suitable for a given type of character, a probability can be low for a character that is actually well recognized and the OCR method may provide worse results than the OCR processors taken independently.