1. Field of the Invention
This invention relates generally to the scanning of a document into an image file by using a scanner for converting the image file into a machine-readable file by applying the technologies of optical character recognition (OCR). More particularly, this invention relates to an improved optical character recognition technologies for allowing more user control to generate final OCR output documents according to user designated document formats, content arrangements and user instructions of subsequent document processes.
2. Description of the Prior Art
Even though the optical character recognition (OCR) technologies for recognizing characters from the scanned images have made significant progresses in terms of recognition accuracy and processing speed, a user is still limited by the lack of flexibilities for controlling the desired file formats, document organization and content arrangements of the output file. As of now, many of the optical recognition programs are providing more intuitive controls for user to marked the desired scanned areas on the scanned images for optically recognizing only the designated areas. However, after marking out the scanned and ignored areas, and marking out some scanned areas for the OCR program to process as text or as graphic element, other than an option for a user to proof read the recognized text by the OCR program, a user still has very limited control over the desired data types, file formats and file types and the content organization and structure of the OCR outputs.
More specifically, the optical character recognition programs as now available are still image oriented. The optical character recognition program basically performs merely primitive recognition operations from raw image according to the location and shapes of the image elements. Almost all the intelligence and user control are directed to the checking the accuracy of the character recognition results and correction of the incorrect OCR output. Other than allowing a user to correct errors of character recognition, there is no further management and processes after the initial recognition operations. However, a user of the OCR often has other purposes than making sure all the image characters are correctly recognized. It is often required that the data and information included in a scanned document be further processed to produce a document that is further organized or tabulated to produce a file of certain formats. Conventional OCR techniques and programs however do not provide such user control functions and the practical usefulness of the OCR programs is therefore greatly limited.
Therefore, there is still a need in the art of optically scanning and recognizing a document to provide new and improved method and file management functions and features such that the above discussed limitations and difficulties can be overcome.