1. Field of the Invention
The present invention relates to an image processing apparatus, image processing method, program, and storage medium that computerize paper documents.
2. Description of the Related Art
In recent years, computerization of information has been advancing, and consequently there is widely used a system that scans a paper document with a scanner to computerize it, stores the computerized document as an electronic file, and transmits the electronic file to other devices.
In the system transmitting an electronic file to other devices as described, an electronic file obtained by computerizing a document is required to have high compressibility in order to reduce transmission cost.
The electronic file is also required to have reusability that enables the electronic file to be partially edited, and high image quality property that prevents image quality from degrading even if the image in the electronic file is enlarged or reduced.
However, when both of a character region and a graphic region (a region containing a photograph, etc.) are present in an electronic file, there are following problems. That is, when compression (lossless compression such as MMR compression) suitable for the character region of the electronic file is performed, image quality of the character region is high but a compression ratio of the electronic file is low. On the other hand, when compression (lossy compression such as JPEG compression) suitable for the graphic region of the electronic file, a compression ratio of the electronic file is high but characters are degraded.
For this reason, an approach as described below is disclosed in Japanese Patent Laid Open No. 2007-272601. In the approach according to Japanese Patent Laid Open No. 2007-272601, an electronic file is divided into a character region, a line drawing region, a graphic region, and the like, and the character and line drawing regions are converted into pieces of vector data. Regions and the like that cannot be easily reproduced by vectorization (conversion into vector data) are compressed in JPEG, and compression results of the respective regions are synthesized and outputted.
However, when a character region of an electronic file is constantly vectorized, even if the character region is present within a graphic region of the electronic file, the character region will be vectorized. Then, the vectorized portion of the character region is subjected to filling processing in order to increase a compression ratio of the graphic region serving as a background of the character region. This may degrade image quality of the graphic region.
Also, in an approach according to Japanese Patent Laid Open No. 2007-272601, when regions are subjected to separating processing, each of the regions is defined as a rectangle to thereby allow easy clipping of the region. However, in the case of clipping out and processing the region on the basis of the rectangle, a graphic region may contain a character region. In such a case, vector data obtained by vectorizing the graphic data may be degraded. This is because a processing method for converting the character region into vector data and a processing method for converting the graphic region into vector data are different from each other.
The aforementioned problem will be explained with reference to FIG. 19. FIG. 19(a) illustrates a rectangular graphic region clipped out of an input image. FIG. 19(b) illustrates a state where a character region contained in the graphic region of FIG. 19(a) has been converted into vector data. In FIG. 19(b), the character region having been converted into the vector data is displayed with being superimposed on the graphic region having been subjected to filling processing. FIG. 19(c) is a diagram in which only the graphic region not containing the character region is extracted from the graphic region of FIG. 19(b), and we assume that a portion corresponding to the character region having been converted into the vector data is subjected to the filling processing with, for example, an average value of pixels near the character region, or the like. At this time, image quality of the graphic region may degrade. For example, when there is a pattern in a background near the character region, difference between the filled portion and its surrounding portion is likely to become significant, and the image quality is likely to degrade. FIG. 19 (d) is a diagram in which the character region clipped out of FIG. 19(a) is converted into the vector data, and corresponds to the character region of FIG. 19(b). As illustrated in FIG. 19(c), when the character region within the graphic region is subjected to the filling processing, the image quality of the graphic region may degrade.
The present invention is made in view of the aforementioned problems, and has an object to achieve both of high compressibility and high image quality property of an electronic file that contains a character region and a graphic region.