1. Field of the Invention
The present invention relates to a method for compressing image data scanned by a scanner and to an image processing apparatus and the like that performs such compression.
2. Description of the Related Art
Conventionally, image processing apparatuses called MFPs (Multi Functional Peripherals) have often been used to attach image data obtained by scanning a document to emails and send those emails. Color-compatible models of such image forming apparatuses have recently come into frequent use, and the demand to be able to attach color image data to emails and send the emails is increasing.
However, when, for example, an A4-sized document is scanned at 300 dpi in full color, the size of the resulting color image data is approximately 25 MB, and there are cases where it is difficult to attach such a large file to an email and send the email. It is therefore common to compress such image data before sending it. However, compressing the entirety of the image data at a high compression rate in order to enable it to be sent via email can result in characters present in the image blurring and thus becoming illegible. Meanwhile, reducing the compression rate in order to make the characters legible may not result in a file size that is small enough to be sent.
Accordingly, a conventional scheme that reduces the data amount by using a technique that generates a PDF (Portable Document Format) file with a high compression rate, called high-compression PDF or compact PDF, is used. A PDF file is generated in the following manner according to such a technique.
First, an area with objects such as characters, graphics, and photographs included in the image to be converted (the target image) is extracted. It is then determined whether the extracted area is an area including characters or an area including an object aside from characters. The area including characters is binarized, and a single representative color is determined for the characters. A compression process is then performed, taking into consideration the legibility of the characters. The areas including objects aside from characters are then compressed at a high compression rate. Through this, a PDF file is generated at a high compression rate, taking into consideration the legibility of the characters.
Incidentally, there are cases where maps are included in the target image. Maps normally include characters expressing place names or the like. However, when generating the abovementioned high-compression PDF file, areas including maps are determined to be areas including objects that are not characters. In such a case, the compression process does not take into consideration the legibility of the characters included in the maps, and thus those characters will become difficult to read.
In order to maintain the legibility of characters included in maps, it is thus necessary to apply a compression process appropriate for map areas to those map areas. To do so, it is necessary to distinguish map areas from other areas.
The conventional methods have been proposed as techniques relating to determining map areas (Japanese Patent Laid-Open No. 2005-79787 [Patent Document 1] and Japanese Patent Laid-Open No. H10-285394 [Patent Document 2]).
According to the method of Patent Document 1, a histogram is generated, indicating the darkness distribution properties of a document image obtained by a document reading unit reading a map document. The respective ratios of high-darkness component, medium-darkness component, and low-darkness component to the overall darkness component present in the document image are then found using the generated histogram. In the case where the ratio of the medium-darkness component and the high-darkness component is greater than the ratio of the low-darkness component, the document image is determined to be a document image of a map manufactured abroad, whereas in the case where the ratio of the medium-darkness component and the high-darkness component is less than the ratio of the low-darkness component, the document image is determined to be a document image of a map manufactured in Japan.
Meanwhile, according to the method of Patent Document 2, pattern matching is performed on a document image using a ridge pixel pattern of a predetermined size (5 pixels high by 5 pixels wide), thereby detecting character pixels present in halftone dots included in the image. In the case where the quantity of character pixels is high, the document is determined to be a map-like document, whereas in the case where the quantity of character pixels is low, the document is determined to be a general type of document.
With the method of Patent Document 1 as described above, area determination is carried out based on whether or not the ratio of medium-darkness component and high-darkness component pixels is high. When attempting to distinguish map areas from other types of areas using such a determination method, areas with photographs or the like that have darkness distribution properties similar to maps are mistakenly determined to be map areas. It is therefore highly likely that map areas cannot be accurately distinguished from non-map areas using the method of Patent Document 1.
Meanwhile, with the method of Patent Document 2, a pixel of interest is determined, pattern matching is performed on the pixel of interest and its surrounding pixels using a ridge pixel pattern of a predetermined size, and it is then determined whether or not the pixel of interest is a character pixel. This process must be performed on all pixels, one at a time, and thus requires a significant amount of processing.