Computer processing of a paper document may entail creation of a digital image of the document and conversion of the digital image to machine-readable text. A printer or scanner may generate the digital image of the paper document.
Conventionally, optical character recognition (OCR) software is used to render the digital image into machine-readable text. However, conventional OCR is typically unreliable with regard to irregular character forms. For example, conventional OCR typically produces sub-optimal results when deciphering business logos, handwritten text, blurred text, unusual or script-type fonts, or mathematical formulas.
Conventional scanners and OCR software may rely on feature detection and do not typically incorporate pattern recognition. Further, conventional processing of a digital image typically does not include machine-learning that incorporates user feedback regarding individual auto-corrections.
It would be desirable therefore, to provide machine-learning algorithms for transformation of both standard and non-standard digital images to auto-corrected machine-readable text. It would be desirable to package these algorithms in a portable device that is compatible with an existing scanner and to coordinate with a user interface to receive ongoing feedback.