(1) Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and an image processing program, and specifically to an art for compressing image files with maintaining character legibility.
(2) Related Art
In recent years, document digitization has been done increasingly in various fields, and now digital files are transmitted and received as e-mail attachments. Moreover, along with the spread of color scanners, color documents also have become converted into digital files.
In digitization of color documents, if an A4-size full color document is scanned at a resolution of 300 dpi for example, the digital file has a size of as large as approximately 25 MB. Accordingly, in general digital files of color documents (hereinafter referred to as “image file”) are compressed, and then attached to e-mails.
In order to attach an image file to an e-mail, the image file needs to be lossy-compressed to sufficiently reduce a size of the image file. If an image file is lossy-compressed at a high compression ratio, a character is difficult to be read. Also, if an image file is lossy-compressed at a compression ratio in which a character can be read, the size of the image file cannot be fully reduced. Accordingly, compact PDF arts using a compression method determined depending on whether an area includes a character have been developed.
As one of such compact PDF arts, the following art is known. An image file is referred to determine character areas and non-character areas. The character area is binarized with maintaining a high resolution, and is integrated with another character area. And then, the integrated character area is lossless-compressed. Therefore, character legibility can be secured. Also, a lowering-resolution is performed on the non-character area, and simultaneously lossy-compression is performed on the non-character area. Therefore, a higher compression ratio can be achieved.
However, documents to be converted into digital files using a color scanner include documents such as magazines and catalogs including many illustrations and photographs, in addition to general business documents including texts and tables. Moreover, there is a great variety of fonts and colors used in such documents.
In conventional arts, each pixel which constitutes image data is connected to each other, and then label processing is performed on the pixels to determine an object. An area surrounded by a rectangle circumscribing the determined object and having no more than a predetermined size is judged to be a character area.
This allows the processing to be easily performed, thereby reducing processing loads. However, there is a great variety of documents to be digitized, as mentioned above. Therefore, an area that should be judged as a character area might be misjudged as a non-character area, and the misjudged area is lossy-compressed, thereby deteriorating character legibility in decompressed image data.
Particularly, if a character is included in a table or a figure, misjudgments tend to occur, thereby deteriorating character legibility.