Many enterprise processes today involve paper digitization. One can also see a significant shift towards paper archive digitization. A common existing digitization process includes scanning and an optical character recognition (OCR) application followed by manual verification and/or key-in and saving the data in a database. However, there is no fast and robust verification methodology in this process to ensure that the whole important data in the paper is saved.
Existing approaches include a side-by-side approach where the original scan is shown on one side and the recognized content on the other side. However, such an approach is a laborious process. In such an approach, an operator is forced to view the entire page to be entered, rather than focusing on the specific word in question. Accordingly, many systems have been proposed such that, at any given moment, an operator sees only the word being corrected or even few smaller snippets of information (characters). Such existing approaches enhance operator productivity, but also include an undesirable side effect, in that the operator may miss information that was omitted by the OCR process (for example, handwritten remarks added on the book margins).