In traditional classification environments, a “document” becomes classified or not according to a variety of schemes. Among them, schemes define categories for document placement according to content or attributes of the document, e.g., subject matter, author, document type, size, etc. In automatic classification, a hard copy document becomes digitized for computing actions, such as electronic editing, searching, storing, displaying, etc. Digitization also launches routines, such as machine translation, data extraction, text mining, invoice processing, invoice payment, storage, displaying, sorting, and the like. Optical character recognition (OCR) and image feature detection/extraction are conventional methods used during the routines.
Unfortunately, OCR and feature detection require intensive CPU processes and extended periods of time during execution, thus limiting their effectiveness. OCR and feature detection are both also known to regularly fail their role of extracting data when two or more scanned documents have variations in their resolution, bit-depth, and/or rotation, especially between a trained set of reference documents and an unknown document being evaluated. As such, automated processes often seek manual assistance from a user, including helping recognize and sort documents by identifying one or more key features. However, the problem is compounded, and can become labor intensive, when training complicated documents, multiple versions of the same document, closely matching documents, etc. Also, conventional processing of these documents places practical limits on how many documents can be processed per a given interval and often returns ambiguity with unstructured documents or documents containing no ascertainable text that can be read with OCR.
Solutions to these problems are often obtained by additional and more complicated software routines, which only add to the burden of CPU consumption. For many users, this overly complicates their needs and slows down processing, especially when their classification schemes are of a narrow or regular interest. For example, small businesses needing invoice bill paying and sorting for but a few vendors would enjoy faster and less intensive processing with coarse or gross document sorting, instead of slower/intensive processing with more robust OCR and feature detection models. If such also included the entire elimination of OCR, business owners could achieve even cheaper and faster results, especially with poorly scanned documents, e.g., angled or distorted documents (smudge, wrinkle, etc.), where OCR techniques struggle. What is needed then, are coarse classification schemes for documents. Further needs should also contemplate instructions or software executable on controller(s) for hardware, such as imaging devices. Additional benefits and alternatives are sought when devising solutions.