One well known technique to enter one or more pages of a document into a document processing system employs an optical scanner to sense differences in optical contrast that occur on a surface of a page. The differences in optical contrast are converted to binary information (pixels) at a predetermined or selected resolution, such as 200 dots per inch (dpi), and are output in a scan line format. The output data may be subsequently processed to identify an informational content of the scanned image. Optical character recognition (OCR) is one well known processing technique that is used to convert the image pixels to recognized alphanumeric characters.
One problem that arises in systems that digitize document and other types of pages is in identifying blank pages and partially blank pages. For example, if a number of pages of a double sided document are automatically fed through a document scanner some of these pages may be blank (text only on one side), or partially blank. In that the page image data is typically compressed or otherwise processed prior to storage, it can be appreciated that inefficiencies occur when a totally or partially blank page is input to a data compression algorithm. That is, it would be desirable to rapidly identify a page as being blank so that the page image can be discarded without being further processed. In like manner, if a page is only partly covered by text or graphics, it would be desirable to input to a subsequent process, or to display, only that portion of the page image that contains information, and to ignore the remainder of the page image.