There have been proposed image matching techniques for comparing (i) image data obtained by reading a document by use of a scanner or the like with (ii) image data of a preliminarily stored reference document so as to determine a similarity between the image data and the preliminarily stored image data.
Examples of the method for determining a similarity include: a method in which a keyword is extracted from an image with OCR (Optical Character Reader) so as to carry out matching with the keyword; a method in which only a ruled line image having a ruled line is focused on as a target image, and matching is carried out with features of the ruled line (see Patent Document 1); and a method in which a similarity is determined based on color distributions of an input image and a storage image (see Patent Document 2).
Patent Document 3 discloses a technique in which a descriptor is formed from features of an input document, and matching between the input document and a document stored in a document database is carried out by use of the descriptor and a descriptor database in which descriptors are stored and which indicates a list of documents including features from which the descriptors are formed. A descriptor is selected such that the descriptor is invariant for distortions generated by digitalization of a document and differences between an input document and a document used for matching in a document database.
In this technique, when the descriptor database is scanned, votes for each document in the document database are accumulated, and a document having the maximum number of votes obtained or a document whose number of votes exceeds a threshold value is used as a matched document.
Further, Patent Document 4 discloses a technique in which a plurality of feature points are extracted from a digital image, a set of local feature points are determined out of the extracted feature points, a partial set of feature points is selected out of the determined set of local feature points, invariants relative to geometric transformation each as a value characterizing the selected partial set are calculated in accordance with plural combinations of feature points in the partial set, features are calculated from combinations of each of the calculated invariants, and a document and an image corresponding to the digital image data is searched by voting documents and images having the calculated features stored in a database.
Conventionally, in an image data output processing apparatus, e.g., a copying machine, a facsimile device, a scanning device, or a multi-function printer, which carries out, with respect to input image data (image data of a target document to be matched), an output process such as a copying process, a transmitting process, an editing process, or a filing process, when it is determined that an input image of a target document is similar to an image of a reference document by use of such the image matching techniques, its output process is controlled.
For example, there has been known techniques of a color image forming apparatus as anti-counterfeit techniques with respect to a paper currency or a valuable stock certificate, in which it is determined whether or not input image data is identical with an image of a paper currency or a valuable stock certificate in accordance with a pattern detected from the input image data, and when it is determined that the input image data is identical with a reference image, (i) a specified pattern is added to an output image so that an image forming apparatus that has made a copy of the image data can be specified from the output image, (ii) a copied image is blacked out, or (iii) a copying operation is prohibited with respect to the input image data.
Patent Document 1: Japanese Unexamined Patent Publication, Tokukaihei, No. 8-255236 (published on Oct. 1, 1996)
Patent Document 2: Japanese Unexamined Patent Publication, Tokukaihei, No. 5-110815 (published on Apr. 30, 1993)
Patent Document 3: Japanese Unexamined Patent Publication, Tokukaihei, No. 7-282088 (published on Oct. 27, 1995)
Patent Document 4: International Publication No. WO 2006/092957, pamphlet (published on Sep. 8, 2006)
However, such a conventional image matching apparatus has a problem in which, in a case where a target document is an N-up document or a reduced-size document, it is difficult to precisely determine a similarity to a reference document.
Conventionally, there have been known some image matching apparatuses in which input image data is binarized, a connected region in which pixels in the binarized image are connected to each other is specified, a feature point of the connected region thus specified is extracted based on coordinates, in the binarized image, of each pixel included in the connected region, features indicative of a similarity of the image are calculated based on the extracted feature point, and similarity determination is carried out.
In such the similarity determination, a difference in a condition for specifying a connected region decreases accuracy in determination. The difference in a condition for specifying a connected region means a state in which a partial image that is specified as a connected region in one of two images for comparison is not specified as a connected region in another one of the two images.
As the connected region, a region that includes more than a specified number of pixels (the default number of pixels) is specified. This is because an isolated dot, a noise, and the like are to be removed.
However, in the conventional image matching apparatuses, a threshold value for removing such an isolated dot and a noise is fixed to a default threshold value. The default threshold value is a value capable of removing an isolated dot and a noise in image data of a reference document. In a case where a connected region is specified in an image reduced in size from an original image size (e.g., an N-up document, a reduced-size document, and the like) by use of the same threshold value as the reference document, in a part in which the number of pixels actually exceeds the threshold value and which should be specified as a connected region, the number of pixels cannot exceed the threshold value because the image is reduced in size, thereby resulting in that the part is not specified as the connected region. Consequently, the number of feature points decreases and features differ, thereby decreasing accuracy in determination.