The processing of scanned documents is typically enhanced by the ability to process different content types within a document differently. For example, processing of a document involving both text and halftone images can be enhanced by processing text differently from halftone images or one color differently from another color.
A wide variety of imaging technologies benefit from processing differing content types differently. For example, printing technologies, such as electrophotographic, electrostatic, electrostatographic, ionographic, acoustic, piezo, thermal, laser, ink jet, and other types of image forming or reproducing systems adapted to capture and/or store image data associated with a particular object, such as a document, and reproduce, form, or produce an image may provide improved results by altering processing depending on the content type. Furthermore, scanning of documents for electronic storage or other electronic processing, such as optical character recognition or digital photo manipulation or storage, can be improved by tailored processing of different content types.
“Auto-windowing” is a process of determining contiguous areas of a document, e.g. windows, of one content type. By way of example, auto-windowing can group an area of text into a window, areas of white space into multiple windows and a halftone image into one or more windows depending on the composition of the halftone image.
Typically, the ability to determine the locations of differing content types is performed on a page-by-page basis and has involved multiple stages of processing of each full page of the document after an initial scanning process. Therefore, a large memory capacity is required to process each full page. Some conventional methods have involved multiple full-page scans of each page. Typically, substantial amounts of time are required because of the extensive processing and multiple stages that have been required, limiting the use of auto-windowing in high speed document processing.
For many image-processing algorithms, such as filtering, the page is processed on a scan line by scan line basis. Ideally, the algorithm for grouping content types into windows would have available as many scan lines as required in order to determine where one region encounters (e.g. grows into) another region. Previously, this has required extensive processing time for average page sizes.
As a result of the above-noted limitations of conventional methods, the ability to incorporate tailored processing of differing content types within a document has been difficult to implement in high-speed document processing machines. Such capabilities have also been difficult to inexpensively implement because of the substantial memory requirements.