There have been proposed various techniques for comparing (i) input image data obtained by a scanner reading a document image with (ii) a preliminarily stored image so as to determine a similarity between the input image data and the preliminarily stored image.
Examples of the method for determining a similarity include: a method in which a text image is extracted, a keyword is extracted from the text image with OCR (Optical Character Reader) so as to carry out matching with the keyword; and a method in which features of a ruled line included in an image are extracted so as to carry out matching with the features.
Further, Patent Document 1 (Japanese Unexamined Patent Publication No. Tokukaihei 8-255236 (published on Oct. 1, 1996)) discloses a technique in which texts, frames for text strings, frames etc. are recognized from an input image and matching is performed with respect to each frame based on frame information, thereby performing a format recognition of a ruled line image etc.
Further, Patent Document 2 (International Publication No. WO 2006/092957A1, pamphlet (published on Sep. 8, 2006) discloses a technique in which a centroid of a word in an English document, a centroid of a connected component of a black pixel, a closed space of a kanji character, a specific portion repeatedly appearing in an image etc. are extracted as feature points, a set of local feature points are determined out of the extracted feature points, a partial set of feature points is selected out of the determined set of local feature points, invariants relative to geometric transformation each as a value characterizing the selected partial set are calculated in accordance with plural combinations of feature points in the partial set, the calculated invariants are regarded as features, and a document matching is performed in accordance with the features.
However, the techniques of Patent Documents 1 and 2 have a problem that in a case where input image data has been read while being skewed with respect to a predetermined positioning angle of a reading position of an image reading apparatus or input image data is data having been subjected to a process such as enlarging and reducing, features cannot be extracted with high accuracy.
For example, in the technique of Patent Document 1, the results of recognition of texts, frames for text strings, frames etc. vary according to the influences of the skew, the enlarging, the reducing etc., and consequently it is impossible to perform a format recognition with high accuracy.
Further, in the technique of Patent Document 2, the results of extracting a centroid of a word in an English document, a centroid of a connected component of a black pixel, a closed space of a kanji character, a specific portion repeatedly appearing in an image etc. vary according to the influences of the skew, the enlarging, the reducing etc., and consequently accuracy in document matching drops.
In a case where a feature point is extracted from an image including a handwritten text (e.g. an image of a document which was printed in a predetermined font and on which a handwritten note is written), the techniques of Patent Documents 1 and 2 are particularly likely to make an erroneous determination, because the techniques has lower determination accuracy due to the skew, the enlarging, the reducing etc. as well as because a handwritten text is greatly different from the shape of a font stored in an image processing apparatus.
Further, the technique of Patent Document 2 has a problem that, when a feature point is extracted, binarization of image data and labeling of the image data are performed before a centroid of a word in an English document, a centroid of a connected component of a black pixel, a closed space of a kanji character, a specific portion repeatedly appearing in an image etc. are extracted, which complicates the process and requires a larger circuit configuration.
Further, in the case of the technique in Patent Document 2 where a centroid of a word and a centroid of a connected component of a black pixel are extracted as feature points, when input image data is data of a document having a large part of a table and a small part of texts, less number of feature points are extracted, which drops accuracy in matching image data.