Optical Character Recognition (OCR) is a technology by which scanned or photographed images of typewritten or printed text are transformed into machine-encoded/computer-readable text. In a typical procedure, the computer receives an image of text data and matches portions of the image to example character shapes/patterns. However, current OCR technologies still make mistakes in character recognition, confusing similar character shapes and, therefore, returning text with errors throughout. Such error-ridden text is not acceptable for most applications. Hence, after recognizing text, the errors must be removed from the text.
The common process for removing such errors is for the user to read completely through the recognized text and correct the errors manually. However, this is a time consuming and laborious task. The problem of correcting errors is made more difficult because the users may be unable to determine which letter in a word is wrong even if they recognize that the word is not correct.