1. Field of the Invention
The present invention relates to a document image processing apparatus and method for inputting and storing a document as an image, more specifically to a document image processing apparatus having a function of searching stored document images, to a document image processing method using the function, to a document image processing program and to a recording medium on which the document image processing program is recorded.
2. Description of the Related Art
A document filing apparatus has been in practical use, which utilizes an image inputting device such as an image scanner to convert a document into an image and electronically store the image, and enables searching of the document later on. A technique relating to the document filing apparatus just mentioned has been disclosed in Chinese Unexamined Patent Publications CN1402854A, CN1535430A, and CN1851713A.
To search the document images read out as image data, it is necessary to take work to manually provide index information for searching, to the respective document images. Consequently, this requires enormous labor.
In addition, an apparatus has also been proposed, which locates a character region (a text region) of document image, performs optical character reader (OCR) recognition, and enables a full-text search according to the text content. The related art using the OCR recognition includes, for example, the technique disclosed in Japanese Unexamined Patent Publication JP-A 7-152774 (1995).
However, it problematically requires considerable calculation and thus a long time to achieve the OCR recognition. Moreover, a low recognition rate may lead to false recognition which causes a failure in searching for the target character. Accordingly, the OCR recognition has a problem in search precision.
Meanwhile, Japanese Unexamined Patent Publication JP-A 10-74250 (1998) discloses a technique which enables the automatic full-text search without using the OCR recognition.
In the constitution of the aforementioned Publication, a category dictionary is prepared in advance, in which characters are classified based on image features into similar character categories for each of the similar characters. And then, at the time of registering a document image, no character recognition is performed on any characters in a text region (a character region) while image features are extracted to be used for classifying the characters into character categories, and the category series recognized for respective characters are stored together with the inputted images. At the time of searching, the respective characters in a search keyword are converted into corresponding categories, and document images partially containing the converted category series are taken out as a search result.
And then, as an effect of the constitution, it is described that the constitution can provide a document filing which enables high-speed processing with low computational power for registering documents and which can realize a low rate of incomplete search in searching for the target character.
For example, when index information for searching is prepared on the basis of the technique of the JP-A 10-74250, characters similar in image feature, of all the characters constituting a headline are recorded in the index information as candidate characters in order of degree of similarity.
The degree of similarity provided in the index information is mainly used at the time of comparing a search keyword and the index information.
The degree of similarity is only an independently-set parameter for each character. Features of phrase in the headline word are not reflected in the index information. As a result, the search precision is still insufficient.