1. Field of the Invention
The present invention relates to an image processing apparatus for determining a similarity between input image data and image data which has been stored in advance, an image forming apparatus and an image reading apparatus including the same, and an image processing method.
2. Description of the Related Art
There has been proposed an image processing method in which a document image is read with a scanner, image data obtained by reading the document image is matched to image data which has been stored in advance, and a similarity between the former image data and the latter image data is determined. For example, a method in which a keyword is extracted from a document image with an OCR (Optical Character Reader) to determine a similarity of image data by the extracted keyword, and a method in which an image a similarity of which is to be determined is limited to a form image with ruled line and features of the ruled line are extracted to determine a similarity of image data, and the like have been proposed.
Japanese Unexamined Patent Publication JP-A 7-282088 (1995) discloses a matching apparatus wherein a descriptor is generated from features of an input document, and using the descriptor and a descriptor database, the input document is matched against documents in a document database.
Herein, the descriptor database denotes a database in which a list of documents containing features from which descriptors are generated and the descriptors are stored. Furthermore, the descriptor is selected to be invariant to distortions caused by digitizing the documents or differences between the input document and a matched document in the document database.
Votes are accumulated for each document in the document database when the descriptor database is scanned, in which one document which accumulates the most votes, or the document with more than a threshold number of votes is determined as the matching document.
The matching apparatus described in JP-A 7-282088 covers document images composed of a black text and a color text with white or low-density background, and document images composed of a white text, so-called reverse text with colored background can not be extracted, thus determination accuracy is deteriorated when document images contain a reverse text.