There have been proposed image matching techniques for comparing image data of a document obtained by reading the document by use of a scanner or the like with image data of a preliminarily stored reference document, so as to determine a similarity between the image data and the preliminarily stored image data.
Examples of the method for determining a similarity include: a method in which a keyword is extracted from an image with OCR (Optical Character Reader) so as to carry out matching with the keyword; a method in which only a ruled line image having a ruled line is focused on as a target image, and matching is carried out with features of the ruled line (see Patent Document 1); and a method in which a similarity is determined based on color distributions of an input image and a reference document (see Patent Document 2).
Patent Document 3 discloses a technique in which a descriptor is formed from features of an input document, and matching between the input document and a document stored in a document database is carried out by use of the descriptor and a descriptor database in which descriptors are stored and which indicates a list of documents including features from which the descriptors are formed. A descriptor is selected such that the descriptor is invariant for distortions generated by digitalization of a document and differences between an input document and a document used for matching in a document database.
In this technique, when the descriptor database is scanned, votes for each document in the document database are accumulated, and a document having the maximum number of votes obtained or a document whose number of votes exceeds a threshold value is used as a matched document.
Further, Patent Document 4 discloses a technique in which a plurality of feature points are extracted from a digital image, a set of local feature points are determined out of the extracted feature points, a subset of feature points is selected out of the determined set of local feature points, invariants relative to geometric transformation each as a value characterizing the selected subset are calculated in accordance with plural combinations of feature points in the subset, features are calculated from combinations of each of the calculated invariants, and a document and an image corresponding to the digital image data is searched by voting documents and images having the calculated features stored in a database.
Conventionally, in an image data output processing apparatus, e.g., a copying machine, a facsimile device, a scanning device, or a multi-function printer, which carries out, with respect to input image data, an output process such as a copying process, a transmitting process, an editing process, or a filing process, when it is determined that an input image data is similar to an image data of a reference document by use of such the document matching techniques, its output process is controlled.
For example, there has been known techniques of a color image forming apparatus as anti-counterfeit techniques with respect to a paper currency or a valuable stock certificate, in which it is determined whether or not input image data is identical with an image data of a paper currency or a valuable stock certificate (reference document) in accordance with a pattern detected from the input image data, and when it is determined that the input image data is identical with the image data of the reference document, (i) a specified pattern is added to an output image so that an image forming apparatus that has made a copy of the image data can be specified from the output image, (ii) a copied image is blacked out, or (iii) a copying operation is prohibited with respect to the input image data.
[Patent Document 1]
    Japanese Unexamined Patent Publication No. 255236/1996 (Tokukaihei 8-255236) (published on Oct. 1, 1996)[Patent Document 2]    Japanese Unexamined Patent Publication No. 110815/1993 (Tokukaihei 5-110815) (published on Apr. 30, 1993)[Patent Document 3]    Japanese Unexamined Patent Publication No. 282088/1995 (Tokukaihei 7-282088) (published on Oct. 27, 1995)[Patent Document 4]    International Publication No. WO2006/092957, pamphlet (published on Sep. 8, 2006)
However, the conventional image data output processing apparatuses do not carry out determination on a basis that a document is duplex, even if the input image data is of a duplex document. As a result, a problem occurs that an output process that is to be regulated is actually permitted, caused by the inability to determine that the input image data is a document image of which an output process is to be regulated.
The following description explains this case with reference to FIG. 25. The conventional image data output processing apparatuses carry out matching of the documents basically by units of one page. Therefore, as illustrated in FIG. 25, even in a case where a matching document is a duplex document A that has document images X and Y on a front side and a back side of the duplex document A, respectively, similarities of the document image X on the front side and the document image Y on the back side, each with respect to reference images, are individually determined as independent document images. As such, an attribute of the duplex document is not utilized.
Therefore, as illustrated in FIG. 25, in a case where the duplex document A that has, on the front and back sides of the duplex document A, the document image X and the document image Y, respectively, which document images X and Y are prohibited in carrying out an output process, even if the document image X on the front side of the duplex document A is determined as similar to a reference image X, if the document image Y on the back side of the duplex document A is determined as having a low similarity to a reference image Y, thereby determining that the document image Y is not similar to the reference image Y, the output process is likely to be permitted for the document image Y.