1. Field of the Invention
The present invention relates to an image processing apparatus, image processing method, and computer program and, for example, to a technique suitable for saving a paper document as a digital document and the saved paper document to edit the digital document.
2. Description of the Related Art
Along with the recent spread of scanners, digitization of documents is becoming more popular. However, when a digital document with, e.g., an A4 size is saved in a full-color bitmap format, the amount of data is as large as about 24 Mbytes at 300 dpi and requires an enormous storage area. In addition, data of such a large amount is unsuitable for digital transmission.
Such a full-color image is normally compressed, and JPEG is known as a compression method. JPEG can be used to compress a natural image such as a photo very effectively and ensure a high quality. However, when a high-frequency portion such as a text part is compressed by JPEG, image degradation called mosquito noise occurs. Additionally, the compression ratio is also low. Since many general documents include both text and images on one page, it is difficult to ensure both high image quality and a high compression ratio using JPEG.
To solve the above-described problems, region segmentation is executed. A background portion except text regions is subjected to JPEG compression, and each text region with color information is subjected to MMR compression. At the time of decompression, a white JPEG image portion is passed, and the black portion is expressed with a representative text color. According to an image processing apparatus disclosed in, e.g., Japanese Patent Laid-Open No. 2002-77633, an image is obtained by scanning a paper document. Binarization and lossy encoding are executed for a text region without decreasing the resolution while JPEG compression is executed for the background part at a high compression ratio by decreasing the resolution. With this processing, a digital document in a small size suitable for transmission and storage can be obtained without losing color information and text readability.
With the spread of computers, a document creating/editing operation using a document editor application such as a word processor has become common. Not only a demand for browsing a document digitized and saved in the above-described manner but also a demand for entirely or partially inserting a digital document into another document and editing/processing it is growing.
To meet part of this requirement, an image processing apparatus disclosed in, e.g., Japanese Patent Laid-Open No. 2004-265384 specifies an original document file corresponding to the image data of a scanned document by searching a database or the like and actually prints or reuses the original document file to ensure high image quality and editability.
However, a digital document created by scanning is compressed to increase the efficiency for saving the image data, as described above. Hence, if this document is used to create another document, degradation in image by compression poses a problem.
A digital document created by the image processing apparatus disclosed in Japanese Patent Laid-Open No. 2002-77633 holds data that has undergone lossless encoding without decreasing the resolution in a text region. For this reason, the image of this region can be extracted and used without concern for degradation. However, the part other than the text region, i.e., the background region is compressed at a high ratio, and therefore, the problem of image degradation in use cannot be avoided.
In the image processing apparatus disclosed in Japanese Patent Laid-Open No. 2004-265384, a scanned document is replaced with its original document file obtained by a search. For this reason, the data amount is not always small, and the efficiency of transmitting or saving the digital document is not always high.
In addition, in an environment where a document is created by using a document editor application such as a word processor, photos and drawings suitable for reuse may be registered in a database as individual datum. If a document newly created by using these data is not registered in the database as a document file, the original data that should exist in the database cannot be acquired for the target part even by scanning the document by the image processing apparatus.
There is a demand for a technique of solving the above-described problems, i.e., generating data with a high transmission and saving efficiency from a scan image and, when this data is to be used for, e.g., document editing, easily acquiring original data for the target part.
In consideration of the above-described problems, the present invention has as its object to enable easy acquisition of original data.