1. Technical Field
The present invention relates to an image processing method and an image processing apparatus for performing the processing to extract a plurality of feature vectors of each of the images obtained by successively reading a plurality of documents and classify the documents based on the extracted feature vectors, an document reading apparatus having the image processing apparatus, an image forming apparatus having the document reading apparatus, a computer program for realizing the image processing apparatus and a recording medium recording the computer program for realizing the image processing apparatus.
2. Description of Related Art
A technology is known of reading an document by a scanner, recognizing the document format information from the input image obtained by reading the document, classifying the input image by performing matching processing for each element based on the recognized document format information, and filing the input image according to the result of the classification.
For example, recognition processing such as line segment extraction, character frame extraction, character recognition or frame recognition is performed on the input image. Pieces of information such as the center coordinates of the frame data, the center coordinates of the character string frame, and the concatenation frame information are extracted from the result of the recognition. The invariant is calculated from the extracted information. By creating pieces of data necessary for table management (the invariant, the model name, the parameters used for calculating the invariant, etc.) and registering them in a hash table, the format is registered.
When the format is recognized, recognition processing is performed on the input image. Pieces of information such as the center coordinates of the frame data, the center coordinates of the character string frame, and the concatenation frame information are extracted from the result of the recognition. The invariant for each piece of information is calculated, and the corresponding area of the hash table is searched by using the calculated invariant. Voting is performed for each registered document name within the searched area. These processings are repeated for each feature point of the input image, and similarity is calculated with the model of the highest histogram as the result of the recognition. When it is determined that the input image is registered, an identifier is assigned to the input image and the input image is stored. An image filing apparatus is proposed that is capable of reducing the number of processing steps performed by the user by performing the above-described processing to thereby automatically perform matching for each element based on the document format information (see Japanese Patent No. 3469345).