1. Field of the Invention
The present invention relates to a character image feature dictionary preparation apparatus and a character image feature dictionary preparation program, which input a document as an image and store the images and moreover relates to a recording medium on which the character feature dictionary preparation program is recorded. In particular, the present invention relates to a document image processing apparatus, a document image processing program, and a document image processing program-recorded recording medium, each of which includes a function of searching the stored document image.
2. Description of the Related Art
A document filing apparatus has been in practical use, which utilizes an image inputting device such as an image scanner to convert a document into an image and electronically store the image, and enables searching of the document later on. A technique relating to the document filing apparatus just mentioned has been disclosed in Chinese Unexamined Patent Publications CN1402854A, CN1535430A, and CN1851713A.
To search the document images read out as image data, it is necessary to take work to manually provide index information for searching, to the respective document images. Consequently, this requires enormous labor.
In addition, an apparatus has also been proposed, which locates a character region (a text region) of document image, performs optical character reader (OCR) recognition, and enables a full-text search according to the text content. The related art using the OCR recognition includes, for example, the technique disclosed in Japanese Unexamined Patent Publication JP-A 7-152774 (1995).
However, it problematically requires considerable calculation and thus a long time to achieve the OCR recognition. Moreover, a low recognition rate may lead to false recognition which causes a failure in searching for the target character. Accordingly, the OCR recognition has a problem in search precision.
Meanwhile, Japanese Unexamined Patent Publication c discloses a technique which enables the automatic full-text search without using the OCR recognition.
In the constitution of the aforementioned Publication, a category dictionary is prepared in advance, in which characters are classified based on image features into similar character categories for each of the similar characters. And then, at the time of registering a document image, no character recognition is performed on any characters in a text region (a character region) while image features are extracted to be used for classifying the characters into character categories, and the category series recognized for respective characters are stored together with the inputted images. At the time of searching, the respective characters in a search keyword are converted into corresponding categories, and document images partially containing the converted category series are taken out as a search result.
And then, as an effect of the constitution, it is described that the constitution can provide a document filing which enables high-speed processing with low computational power for registering documents and which can realize a low rate of incomplete search in searching for the target character.
Various feature-extracting methods have been proposed for preparing a dictionary based on the image features and for extracting the image features of each of characters from the document images.
In these feature-extracting methods, the feature can be sufficiently extracted in some cases but not in other cases depending on a type of targeted character. There is thus a problem that a selection of the extracting method may lead to insufficient creation of the dictionary and insufficient feature-extraction of the document image which result in a failure to obtain sufficient search precision in text-searching.