1. Field of Disclosure
The disclosure generally relates to the field of optical character recognition (OCR), in particular to displaying text extracted using OCR and the original images from which the text was extracted.
2. Description of the Related Art
As more and more printed documents have been scanned and converted to editable text using Optical Character Recognition (OCR) technology, people increasingly read such documents using computers. When reading a document on a computer screen, users typically prefer the OCR'ed version over the image version. Compared to the document image, the OCR'ed text is small in size and thus can be transmitted over a computer network more efficiently. The OCR'ed text is also editable (e.g., supports copy and paste) and searchable, and can be displayed clearly (e.g., using a locally available font) and flexibly (e.g., using a layout adjusted to the computer screen), providing a better reading experience. The above advantages are especially beneficial to those users who prefer to read on their mobile devices such as mobile phones and music players.
However, errors often exist in the OCR'ed text. Such errors may be due to imperfections in the documents, artifacts introduced during the scanning process, and shortcomings of OCR engines. These errors can interfere with use and enjoyment of the OCR'ed text and detract from the advantages of such text. Therefore, there is a need for a way to realize the benefits of using OCR'ed text while minimizing the impact of errors introduced by the OCR process.