1. Field of the Invention
The present invention relates to an image processing apparatus and a control method therefor, and more particularly, to an image processing apparatus for extracting a document area, reading the extracted document area, and storing or printing the read image.
2. Description of the Related Art
For use in known image processing apparatuses, many algorithms for detecting and cutting out a rectangular area from an original image (see, for example, Japanese Patent Laid-Open Nos. 8-237537 (Page No. 15 and FIG. 10) and 2004-30430 (Page No. 17 and FIG. 9)) have been proposed. In addition, an image processing apparatus for performing binarization processing and labeling processing is also known (see, for example, Japanese Patent Laid-Open No. 3-188587 (Page Nos. 5 and 6, and FIGS. 1 and 2)).
An image processing apparatus described in Japanese Patent Laid-Open No. 8-237537 performs outline extraction processing and searches for a rectangular area by searching for a portion having a straight outline. However, in a noisy environment, it is difficult to detect the straight portion of a rectangular outline and to further detect a single rectangle by associating four straight lines that form the rectangular outline.
In order to increase detection accuracy in a noisy environment, an image processing apparatus described in Japanese Patent Laid-Open No. 2004-30430 creates histograms of the number of black pixels in horizontal and vertical directions, and uses the trapezoidal approximation to detect whether a rectangle is present and to detect the inclination direction thereof from the histograms. However, if a boundary between a document area and a background is blurred (if a color difference between a document and a pressure plate for a platen glass is small in a reader), it is difficult to detect a document area.
In a known technique described in Japanese Patent Laid-Open No. 3-188587, a histogram of binary image data is generated. Only black pixels of a predetermined size or greater in the histogram remain. However, if there are a plurality of clusters of pixels which are provided with label numbers in a document of a single page, and if document areas are cut out on a label number-by-label number basis, a plurality of pages is cut out.