Document imaging is a process of scanning a paper document and converting the document to a digital image which is then stored via a magnetic storage device. Such document imaging processes provide the ability to perform an optical character recognition (OCR) for the translation of images of text such as scanned documents, into actual text characters. Classification is an important feature with respect to document image processing and is often a preliminary step towards recognition, understanding, and information extraction.
The majority of prior art techniques for classifying documents are both time consuming and labor intensive. Typically, the documents are processed manually and the classification of the document imaging requires training via a representative sample image to perform complex mathematical analysis, which cluster or classify documents that are similar to one another. Such techniques require significant training and technical resources. Furthermore, such approaches may not cover every classification/extraction scenario and are particularly limited by the representative samples provided.
Based on the foregoing, it is believed that a need exists for an improved method for automatically training a document imaging classification and extraction system. A need also exists for automatically switching between a manual mode and an automatic mode based on constant monitoring, as described in greater detailed herein.