The present disclosure relates to techniques for identifying and correcting errors that can occur when performing character-recognition operations on a document.
Character-recognition techniques are widely used to extract information from documents by converting data in an initial format (such as bitmap) into another format (such as ASCII). For example, optical character recognition (OCR) is often used to convert printed text to corresponding digital values, and intelligent character recognition (ICR) is often used to convert handwritten text to corresponding digital values.
However, the conversion performed by most character-recognition techniques is not perfect, and there is always a finite probability of errors. These errors can significantly complicate and increase the expense of subsequent processing of the extracted information. In addition, in applications where users provide the documents, the occurrence of errors often forces the users to review the converted information to identify and correct any errors. This is a time-consuming process, which degrades the user experience and reduces their confidence in products and services that use character recognition.