1. Field of the Invention
The present invention relates to an image processing apparatus, an image processing apparatus control method, and a storage medium storing a program. More particularly, the present invention relates to a scan noise color reduction process using clustering in image processing. In addition, the present invention relates to a method and apparatus for rapidly performing a clustering process by decreasing the number of clusters to be compared when comparing the color distances between a pixel of interest and clusters.
2. Description of the Related Art
Recently, as the digitization of information advances, systems that save paper documents in the form of digital data, instead of directly saving the paper documents, by reading the paper documents with a scanner or the like, and transmit the digital data to another apparatus have become popular. Under the circumstances, digital documents are required to have high compression ratios in order to reduce the costs of saving and transmission of digital data. On the other hand, when giving priority to the convenience for users, digital documents are also required to have high reusability by which objects in digital data can partially be edited, and high image quality that does not deteriorate even when an image is enlarged or reduced.
When document data contains both a character and photograph region, the image quality is high but the compression ratio is low if compression suited to the character region is performed, and the compression ratio is high but the image quality of the character region degrades if compression suited to the photograph region is performed. Therefore, the following method has been proposed (see Japanese Patent Laid-Open No. 2004-265384). In this method, digital data of a document image is separated into a character region and photograph region, the character region required to have high reusability and high image quality is converted into vector data, and the photograph region and the like that cannot simply be reproduced by vectorization is compressed by JPEG. High compressibility, high reusability, and high image quality of the document image are achieved by synthesizing the processing results of the individual regions and outputting the synthetic image.
A method by which not only a character region but also a graphic region (generally called an illustration, clip art, or line drawing) characterized by including several uniform colors and having clear contours is vectorized has also been proposed (see Japanese Patent Laid-Open No. 2006-344069). The method herein proposes inputting an image that reduces the number of colors of the input image by using color similarity. Then, the contour line of each color region is extracted, functional approximation is performed, and vector data is output by adding color information.
When performing vectorization by the method of Japanese Patent Laid-Open No. 2006-344069, a color reduction process is necessary as pre-processing for reducing scan noise contained in an input image and extracting the contours of the original image. As the scan noise color reduction process, a method of performing matching with predetermined representative colors and a method using clustering adopted in Japanese Patent Laid-Open No. 2006-344069 are useful. The matching with representative colors can classify an input image into a few selected colors. However, several representative colors must be predetermined, and there is the possibility that the predetermined representative colors include a color significantly different from the original image. Accordingly, if the number of colors used in the original image and the colors themselves are unknown before processing, the method using clustering is superior to the method using matching in order to reduce colors while accurately maintaining the colors used in the original image.
Although several techniques that achieve clustering are available, it is possible to apply, for example, a known NN (Nearest Neighbor) method. That is, when clustering inputs P1 to Pn, a cluster C1 having P1 as a representative pattern is formed. Then, the distance between Pi (i≧2) and Cj (j is the number of clusters) is compared with a predetermined threshold value. If the distance between Pi and Cj is smaller than the predetermined threshold value, Pi is made to belong to a cluster having the smallest distance. If there is no cluster having a distance smaller than the threshold value, a new cluster is formed. The algorithm of the NN method executes this processing on all inputs. Details of the NN method are described in, e.g., Agui and Nagao, “Guide to Image Processing Using C Language”, the first edition (ISBN4-7856-3124-4, SHOKODO, 2000).
When applying the color reduction process, it is only necessary to use a pixel value (RGB value or luminance value) as an input, and a color distance such as a Manhattan distance (also called a city block distance) or Euclidean distance as a distance. There is, of course, the problem that when the color distance is decreased, the number of clusters to be generated increases, and this prolongs the processing time required for the operation of comparing the pixel value with each cluster. FIG. 3A is an exemplary view of the clustering process. FIG. 3B is a flowchart showing the procedure of the clustering process. When the pixel value is given as a one-dimensional value (e.g., a value within the range of 0 (black) to 255 (white)) such as a value of a black-and-white image, the number of clusters to be compared can be decreased by sorting generated clusters based on the pixel values. When performing clustering on, for example, an RGB color space, however, the color distance is expressed three-dimensionally. This makes it impossible to uniquely specify a cluster having a color distance close to that of a pixel of interest during the comparison. Therefore, a method like this cannot be used. Consequently, it is necessary to obtain the color distances to all generated clusters to search for a cluster having the closest color distance. This prolongs the processing time as the number of clusters increases.