1. Field of the Invention
The present invention relates to an image processing method, image processing apparatus, and program, which process an image by segmenting it into regions.
2. Description of the Related Art
There is a growing demand to digitize documents and save or send digital document data in place of paper. Document digitization in this case is not limited to processing for simply scanning a document on paper using, for example, a scanner to obtain image data. For example, image data is segmented into regions having different properties such as text, graphic, photo, and table, which configure a document. Then, the document digitization processing executes processing for converting these regions into most suited formats, for example, a text region into character codes, a graphic region into vector data, a background region and photo region into bitmap data, and a table region into structure data. As a conversion method into vector data, an image processing apparatus of Japanese Patent Laid-Open No. 2007-158725 has been disclosed. This image processing apparatus implements region segmentation by clustering processing to extract outlines of respective regions, and to convert the extracted outlines into vector data. Japanese Patent Laid-Open No. 2008-206073 discloses an image processing method which separates an image into a background and foreground, converts the foreground into vector data, and compresses data of the background by a background dedicated method. Also, Japanese Patent Laid-Open No. 2006-344069 discloses an image processing method which removes a noise included in a document image, which is scanned by a scanner and then undergoes clustering processing.
As a method of segmenting an image into regions by clustering processing, a Nearest Neighbor clustering method is known. The Nearest Neighbor clustering method compares a feature vector of a processing target pixel with representative feature vectors of respective clusters to search for a cluster having the nearest representative feature vector. When the distance is equal to or smaller than a predetermined threshold, the processing target pixel is allocated to the corresponding cluster. Otherwise, a new cluster is defined to allocate the processing target pixel to that cluster. Note that color information (a pixel value including R, G, and B values) is generally used as a feature vector. As the representative feature vector of each cluster, a centroid of that cluster is generally used. That is, an average value of feature vectors (color information) of pixels allocated to each cluster is used. The Nearest Neighbor clustering method executes processing for searching all clusters for a cluster having the nearest representative feature vector to the feature vector of the processing target pixel. That is, this method has to calculate distances from representative feature vectors of all clusters for each pixel. For this reason, a longer calculation time is required with increasing number of clusters to enhance the accuracy of region segmentation.
As a related art that can solve this problem, Japanese Patent Laid-Open No. 11-288465 has disclosed a color image processing apparatus. Japanese Patent Laid-Open No. 11-288465 executes clustering based on feature vectors (color information) of a processing target pixel and adjacent pixels. Then, clusters undergo grouping based on color information and geometry information of clusters. Note that the geometry information includes, for example, coordinate information indicating a distance between regions.
However, with the related art of Japanese Patent Laid-Open No. 11-288465, when distances between feature vectors of the processing target pixel and adjacent pixels are large, a new cluster is defined, and the pixel of interest is allocated to the newly defined cluster. As a result, a large number of clusters are defined. For this reason, a processing time required for grouping is increased. Also, with the related art of Japanese Patent Laid-Open No. 2006-344069, noise removal processing is executed after completion of clustering processing for the entire target image. Therefore, during the clustering processing, representative feature vectors of clusters including noise components to be removed also undergo distance calculations with the processing target pixel, thus increasing a processing time.