1. Field of the Invention
The present invention relates to an image processing apparatus for determining a similarity to pre-registered images, based on the feature vector obtained from an acquired image, and also relates to an image forming apparatus and an image reading apparatus comprising the image processing apparatus, and an image processing method.
2. Description of Related Art
As image processing in which a document is read with a scanner, and image data obtained by reading the document is matched with pre-registered image data to determine a similarity between images, there are proposed some methods such as, for example, a method in which a key word is extracted from an image by an OCR (Optical Character Reader) and a similarity between images is determined based on the extracted keyword; and a method which limits images whose similarity is to be determined to form images with ruled lines, and extracts features of the ruled lines to determine a similarity between images.
However, in these determination processes, in order to accurately determine a similarity between images, it is necessary to correct skew of a document to be read (skew correction), and, if the skew correction can not be made, there is the problem that a similarity between images is not accurately determined. Moreover, since the processing to determine a similarity between images is complicated, it is difficult to realize the processing as hardware. When the similarity determining process is realized by a simple algorism, it can be easily realized as hardware. However, it is difficult to improve the determination accuracy, and there is also the problem that tolerance to skew or external disturbance such as noise is insufficient.
Hence, there is proposed, for example, a method (Nakai Tomohiro and three others, “Document Image Retrieval Based on Cross-Ratio and Hashing”, The Institute of Electronics, Information and Communication Engineers Technical Research Report, March, 2005) which calculates the centroid of connected components in a document image, extracts the calculated centroid as a feature point of the connected components, and calculates an invariant with respect to the rotation or skew of the image based on the extracted feature point to determine a similarity between images, thereby capable of accurately determining a similarity between images even when a target image is skewed or includes writing that is not contained in a pre-registered image.
As a method for calculating the centroid of an image pattern, there is proposed an image processing method capable of calculating the centroid coordinates at high speed by dividing a circumscribed rectangular region enclosing a target pattern into a plurality of blocks by considering a pixel matrix as one unit, defining the relative origin and relative coordinates for each block, and performing predetermining processing on each block (see Japanese Patent Application Laid-Open No. 61-260370).
Moreover, as an apparatus for calculating the centroid of a specific pattern in an image, there is proposed an apparatus capable of calculating the centroid at high speed by setting a circumscribed square on a specific target object and extracting the region and the centroid value of the set specific target object in parallel (see Japanese Patent Application Laid-Open No. 10-79035).
However, in the method disclosed in the above-mentioned non-patent document “Document Image Retrieval Based on Cross-Ratio and Hashing”, when calculating the centroid of the connected components, image data on one page is read and stored, the stored image data on one page is binarized, a label assigning process is performed to show in which connected component each pixel is contained, the coordinate values of the pixels contained in a connected component are added up for each connected component, and the sum of the coordinate values is divided by the number of the pixels contained in the connected component to calculate the centroid of the connected component. Therefore, in order to calculate the centroid, it is necessary to store one page of image data. For example, when realizing the image processing by an ASIC, if the memory capacity increases, the number of gates also increases, and consequently the circuit scale becomes larger, the realization of the image processing as hardware of realistic scale is difficult, and the cost rises.
In the method disclosed in Japanese Patent Application Laid-Open No. 61-260370, when obtaining image data from a scanner, the image data is inputted from the scanner on a line-by-line basis, and therefore line buffers corresponding to the number of lines contained in a block are required to perform the processing on a block-by-block basis. Hence, when performing the processing by using a relatively large block, there is the problem that a large memory is necessary. Further, in the apparatus disclosed in Japanese Patent Application Laid-Open No. 10-79035, there is the problem that the shape of a connected component from which the centroid can be calculated is limited.