1. Field of the Invention
The present invention relates to an image processing apparatus for judging similarity between inputted document image data and image data stored in advance.
2. Description of the Related Art
In this kind of image processing apparatus, there has been conventionally performed image processing for reading a document by a scanner and checking image data obtained by the reading against image data stored in advance to judge the similarity between the images.
For example, there are proposed a method of extracting a keyword from image data read by an OCR (optical character reader) and judging the similarity of the image on the basis of the extracted keyword, a method of limiting an image to be targeted by similarity judgment only to a record form image with ruled lines and extracting the characteristics of the ruled lines to judge the similarity of the image, a method of replacing character strings and the like in image data with points and determining the positional relationship among the points (feature points) as features to judge the similarity of the image, and the like.
International Publication WO2006/92957 (publication date: Sep. 8, 2006) discloses an image processing apparatus as described below. That is, connected parts of an image picked up by a digital camera or read by a scanner is regarded as word regions, and the centroids of the word regions are determined as feature points. The feature points are used to calculate a geometric invariant, and features are furthermore determined from this geometric invariant. The features, indexes indicating the feature points, and an index indicating the image are stored in a hash table.
In performing retrieval, feature points, features and indexes indicating the feature points for a retrieval query (inputted image) are determined in a similar process, and the hash table is accessed for performing voting for stored document images.
In determining the features described above, n feature points nearest to a certain target feature point are selected. Then, m (m<n) feature points are further selected from among the selected n feature points, and d (d=m or smaller) feature points are extracted from among the m feature points. For all combinations, the features related to the d feature points are calculated.
In the above retrieval method disclosed in International Publication WO2006/92957, however, when the centroid of a character is assumed to be a feature point, there may be a case where features calculated from patterns with almost no positional variations in centroids, such as long English words, agree with each other even if they are originally different character strings. Therefore, there is a problem that the accuracy of judgment of an image including a lot of such patterns deteriorates.
In view of the above situation, the object of the present invention is to provide an image processing apparatus and an image processing method capable of preventing deterioration of judgment accuracy when feature points are determined from a document image to calculate features (hash value) with the use of the feature points.