1. Technical Field
The present application relates to: an image processing method and an image processing apparatus for performing determination processing whether an obtained document image is similar to a stored image stored; a document reading apparatus and an image forming apparatus employing the image processing apparatus; and a recording medium for storing a computer program for implementing the image processing.
2. Description of the Related Art
Proposed methods for processing of reading a document with a scanner and then determining whether the read document image is similar to a format stored include: a method in which keywords are extracted from an image read by an OCR and then pattern matching is performed on the basis of the extracted keywords; and a method in which documents of determination target are restricted to formatted documents with ruled lines and pattern matching is performed on the basis of information concerning the ruled lines extracted from the read document.
In storing a format to be used in similarity determination, recognition processing such as line segment extraction, character box extraction, character recognition, or frame extraction is performed on an input image inputted for registration. Then, from the recognition result, information (e.g., a feature point) such as the center coordinates of frame data, the center coordinates of a character string frame, and connecting frame information is extracted. After that, a features (e.g., a hash value) are calculated from the extracted information. Then, data (such as a features, a model name, and a parameter used for calculating the features) necessary for table management is generated and stored into a hash table, so that the format is stored.
In similarity determination for a document, recognition processing is performed on the inputted document image. Then, from the recognition result, information (e.g., a feature point) such as the center coordinates of frame data, the center coordinates of a character string frame, and connecting frame information is extracted. After that, a features (e.g., a hash value) corresponding to each information is calculated. Then, using the calculated features, an area of the hash table stored is searched, and vote is performed for each stored form name in the searched area. This processing is repeated for each feature point of the inputted document image, so that similarity is calculated by adopting as a recognition result a model which is the largest in a histogram. When the document image is recognized as being similar to a stored format, the document image is saved in a state that an identifier is imparted. Employing such processing, a filing apparatus for images (document images) is proposed that automatically performs matching between a document image and a stored format so as to reduce the user's work in the processing step (see Japanese Patent Publication No. 3469345).
Nevertheless, in the apparatus described in Japanese Patent Publication No. 3469345, stored formats used in similarity determination for a document image are documents in each of which a format (such as frames, ruled lines, and character strings that indicate entry items) is defined, that is, documents in which nothing is entered in the entry fields of each document. Thus, the features (e.g., a hash value) extracted from each stored format does not contain information (e.g., a feature point) concerning the items (e.g., character strings, figures, and marks) to be entered as the entry items. Accordingly, even a document image with omission in which necessary information is not written in the entry fields of the inputted document image can be determined as being similar to a stored format. This has caused a problem that in spite of the omission in the document image, the inputted document image is filed intact. Thus, it has been desired to determine omission in a document image with satisfactory accuracy.