The electronification of information has continued to proceed in recent years and there is growing use of systems in which paper documents, rather than being archived as is, are put into electronic form and stored, and in which the resulting electronic data may be transmitted to other systems or devices. Further, documents that can undergo such conversion to electronic form are no longer limited solely to black-and-white bi-level images and it is now becoming possible to obtain electronic documents of full-color (multilevel) images.
Furthermore, electronic documents are no longer merely those obtained by simply scanning a document on paper using a scanner or the like to convert the document to image data. Electronic documents now contain document images produced by conversion of the original to more sophisticated information. For example, a document image is separated into areas, character recognition processing is applied to the text areas to convert them to strings of character code, and photographic areas are converted to vector data representing contours. (For example, see the specification of Japanese Patent Application Laid-Open No. 2004-265384.)
Such vectorized images, even though they may be full-color images, also include images obtained by scanning and vectorizing documents created by software for producing illustrations and graphics. These images have object contours that are more clearly defined in comparison with natural images such as photographs and are characterized in that the colors which appear are more limited. These images shall be referred to as “clip-art images”.
According to a method of generating road data disclosed in the specification of Japanese Patent Application Laid-Open No. 2004-246554, first a photographic image that has been input in full color is converted to a bi-level image. Next, contours and center lines are extracted from the bi-level image and the lines obtained and the color information of the original image are converted to vector data. It is described that processing for dealing with noise eliminates isolated noise by execution of expansion and contraction processing.
Image vectorization processing is executed as follows according to the prior art: First, an image that has been input in full color is converted to a bi-level image, contours and center lines are then extracted from the bi-level image and the lines obtained and the color information of the original image are converted to vector data.
Processing for separating a document image into areas such as text and photographic areas is a focus of interest in the prior art. Many of such proposed methods segment an image into small areas and distinguish between text and photographs based upon the features of each of the small areas. Further, in applications to document images, often compression processing or correction processing is executed on a per-area basis after processing for separation into text and photographic areas, etc. (For example, see the specifications of Japanese Patent Application Laid-Open Nos. 5-114045 and 9-186866).
Further, a method of determining whether an area is one that should be segmented or not is known for the purpose of efficiently transmitting and storing, without loss, information that results after the separation of a document image into areas (e.g., see the specification of Japanese Patent Application Laid-Open No. 2001-236517). The method described in this prior-art specification includes executing tone-reduction processing (histogram segmentation) and determining whether the difference before and after processing is smaller than a newly decided value.
With the conventional processing described above, however, there are occasions where a valuable area is mistakenly erased as a noise area. In such cases an accurate edge cannot be obtained and the image after vectorization exhibits a decline in image quality. If noise removal is not carried out, a noise area is left as is and a contour is vectorized, then a problem which arises is a tremendous increase in the amount of vector data.
On the other hand, with regard to a clip-art image of the kind mentioned above, it is considered effective to execute vectorization processing, which is based upon an area segmentation method, in accordance with the features of the clip-art image.
In examples of the prior art, however, one does not come across vectorization processing that follows the automatic discrimination of image type (e.g., whether the image is one having an edge or exhibiting gradation). When the same processing is applied to images of different types, suitable vectorized results are not obtained and some processing is executed needlessly.
Graphics include simple clip-art images and texture patterns such as natural images and may include complicated images having a large number of colors. Here a clip-art image is meant to refer to an image of a limited number of colors, such as an illustration (namely an image of a small number of colors). Further, vectorization processing based upon area segmentation that is capable of compressing image information efficiently without loss is suited to clip art. However, this processing is not suited to images of the kind in which portions that are not graphics have been erroneously discriminated as graphics owing to the accuracy of area separation, and to images which, despite being discriminated as graphics, are natural images. It is difficult to obtain compressed images having good image quality when such images are subjected to this processing.
Further, the method described in the specification of Japanese Patent Application Laid-Open No. 2001-236517 does not take into consideration the features of clip art in graphic areas and therefore this method cannot be applied to determinations as to whether a graphics area is a clip-art image or not.