The present invention is generally related to document analysis and, more particularly, is related to a document analysis system and method to flexibly control he analysis of a scanned document or other digital representation of a document.
More and more documents are generated using word processors and the like and are stored on memory devices such as hard drives, floppy disks, compact disks and other mass storage media. Nonetheless, paper and other similar media will continue to be used far into the future. Consequently, there will continually be a need to scan the substance portrayed on such media so that such information may be manipulated on a computer or other like device.
However, the scanning of paper documents to make the content thereon available in a digital environment may be time consuming and costly. In particular, one problem is that the processing of various regions of scanned documents may take a long time requiring the user to wait for an analysis of a whole document. Oftentimes, a user may only want to access a portion of the text, artwork, or other region data types of the scanned document, rather than the entire document. For example, one may wish to obtain specific paragraphs of text from a document.
However, current users are often forced to wait while scan converter technology analyzes an entire document to determine the specific data types of the various regions which are ultimately applied to processing pipelines such as optical character recognition pipelines, etc.
The present invention provides a document analysis system and method. In one embodiment, the document analysis system includes a software implementation on a processor circuit, although dedicated logical circuits may be employed as well. The document analysis system includes an interim analyzer configured to perform an interim document analysis to identify a number of interim regions on a document at an initial setting of pixels-per-inch (PPI). The document system also includes a complete analyzer configured to perform a complete analysis on at least one of the interim regions at a second, higher PPI, thereby generating at least one complete region therefrom. The present invention provides significant flexibility to the user with a number of options relative to the analysis of the regions of information of interest in a document, and to limiting the analysis to such preferred regions.
The present invention can also be viewed as providing a method for controlling document region analysis. In this regard, the method can be broadly summarized by the following steps: performing an interim document analysis to identify a number of interim regions on a document at an initial pixels-per-inch (PPI); and, performing a complete analysis on at least one of the interim regions at a second, higher PPI, thereby generating at least one complete region therefrom.
The present invention has numerous advantages, a few of which are delineated hereafter as merely examples. Specifically, the present invention provides the user with a fast display of the various regions of information on a document and allows the user to control further analysis of these regions and identify the type of information contained therein before processing the regions in an appropriate processing pipeline which may use optical character recognition algorithms, etc. The present invention is also simple in design, user friendly, robust, reliable, and efficient in operation, and easily implemented for mass commercial production.