1. Field of the Invention
The present invention relates to an image processing apparatus and a method of controlling the same, and more particularly, to an image processing apparatus and method for determining whether read image data is blank page image data.
2. Description of the Related Art
In an image reading apparatus such as a digital copying machine, if read image data is blank page data containing no image object, throughput, a paper sheet, power, and the like are wasted for image processing and print processing. For this reason, there are provided various types of methods of deleting image data determined as a blank page and suppressing the consumption of wasteful printing, paper sheets, and toner.
For example, Japanese Patent No. 4251629 has proposed a method of controlling the execution of subsequent print processing in accordance with the determination result obtained by a blank page determination circuit upon inputting the luminance signal obtained at the time of reading an original to the blank page determination circuit.
This method, however, cannot detect the blank page data generated by image processing with respect to read image data.
Recently, for example, when creating a document, the user tends to use sophisticated functions of elaborately decorating fonts, freely creating graphic patterns, and capturing photos and the like, as well as simply typing characters. However, the more sophisticated the contents of a product, the greater the effort required to newly create a document. It is therefore required to reuse part of a document created in the past without any change or a processed/edited document as much as possible.
Under the circumstances, there are conceivable techniques for obtaining the contents of a document printed on a paper sheet or the like as reusable data. For example, Japanese Patent Laid-Open No. 1-129358 discloses a technique of, when making an apparatus electronically read a document on a paper sheet, acquiring a document matching the contents of the read document by searching a database, and allowing the acquired document to be used in place of the data acquired from the read sheet surface. In addition, when no identical document can be specified from the database, a read document image is converted into electronic data (to be referred to as document data) that can be easily edited and reused. In this case as well, the contents of the document can be reused. Japanese Patent Laid-Open No. 1-129358 discloses a technique of identifying areas such as a character area, a line drawing area, a natural image area, and a table area in a document image, and constructing data expressing the relationship between the respective areas in the form of a tree structure. A document image is converted into an electronic document page, which can be edited by an application, by arranging character codes, vector data, image data, and the like in accordance with the above structure. This electronic data has a layout identical to that of the original document, and facilitates changing the positions and sizes of characters and graphic patterns and also performing editing, such as geometrical deformation and coloring, like an electronic document page newly created by a document creation application or the like.
In addition, there is available a technique of recognizing the structure of a table form area in a document image. For example, Japanese Patent Laid-Open No. 1-129385 discloses a technique of acquiring the matrix structure formed by a rectangular frame area in a table. It is possible to convert a table area in a document image into electronic data (document data) having a table structure by combining the row structure of a frame area obtained by this technique with an OCR result obtained from intra-frame characters by the technique.
The above document data can be classified into foreground objects which are areas such as a character area, a line drawing area, a natural image area, and a table area extracted from a document image and background objects as other images. A background object is prepared such that the electronic document obtained by drawing vector data and clipped image data as foreground objects on the background object has an appearance equivalent to that of the original document image. A background object is created by erasing pixel information corresponding to foreground objects from an input document image.
FIG. 6B shows an example of a background object created from an example of an input document image in FIG. 6A. The line drawing portions in FIG. 6A, i.e., line drawing portion pixels such as character pixel clusters 601 to 603, a line drawing pixel cluster 608, and a table frame pixel cluster 604 are painted out in the same pixel colors as those of neighboring pixels. In addition, the entire rectangular range of a natural image area 609 is painted out in the same color as that of neighboring pixels.
Document data created from a document image in this manner generally contains foreground objects and a background object. There is known a function of creating electronic document data without adding any background object to improve reusability with respect to the user. When an original image with this function being effective is converted into document data, only background object is obtained from a page without any foreground object, such as character data. As a result, no document data is created from this page. If a page from which no data is created is not output, the number of original pages differs from the number of pages of output document data. For this reason, it is necessary to perform control so as to add blank pages. As a result, when a page determined not to be a blank page by the above blank page determination method is converted into electronic document data, a blank sheet may be newly created.