1. Field of the Invention
The present invention relates to an apparatus for rough classification of words, a method of such of words, and a record medium recording a control program thereof, particularly to a unit for detecting document areas by using a non-contact type image input device such as a camera, in an apparatus for acquiring document images.
2. Description of the Prior Art
Conventionally, there is an apparatus of this type published on pp. 31 to 32 in xe2x80x9cDidier Guillevic and C. Y. Suen, xe2x80x98Recognition of Legal Amounts on Bank Cheques,xe2x80x99 Pattern Analysis and Application, Vol.1, No.1, pp. 28-41, 1998.xe2x80x9d
FIG. 19 shows an example of configuration of the above apparatus for rough classification of words. The exemplary apparatus has a number of devices, which are refered to as divisions throughout the specification and the drawings. This apparatus comprises terminal 101 for inputting a word image, word feature extraction division 7 for extracting features from a word image, vocabulary selection division 8 for comparing word features generated in word feature extraction division 7 with those of all the vocabulary stored in vocabulary storage division 6 to select only the vocabulary of similar word features, and terminal 102 for outputting such vocabulary.
FIG. 2 shows an example of a word image to be inputted in the apparatus for rough classification of words. Word feature extraction division 7 detects from a word image a feature of a loop, and in the case of lowercase characters, the portions jutting downward of xe2x80x9cyxe2x80x9d and xe2x80x9cgxe2x80x9d (hereafter referred to as descenders) and the portions jutting upward of xe2x80x9chxe2x80x9d and xe2x80x9cbxe2x80x9d (hereafter referred to as ascenders), extracting alignment of an ascender, a descender and a loop as a feature.
Vocabulary storage division 6 is storing 100,000 kinds of words, for instance, in a table format as shown in FIG. 3. In the example shown in FIG. 3, the words related to place names of a certain country are stored. Each word has its word feature extracted from a word image as well as its text described.
Vocabulary selection division 8 compares a word feature extracted in word feature extraction division 7 with those of all the vocabulary stored in vocabulary storage division 6 to output the word from terminal 102, if determined to be similar.
Object of the Invention
However, as for the above-mentioned conventional apparatus for rough classification of words, while the word features utilized in the word feature extraction division are ascenders, descenders, loops and so on extracted from a word image which can be determined from alphabets making up a word, they are not always extracted in a stable manner depending on a quality of the image.
For instance, a loop cannot be detected in the case of a word not described to correctly close the top of xe2x80x98Oxe2x80x99. In addition, there are cases where a loop that cannot exist is detected because neighboring characters have contacted. Thus, word features may not be completely detected or a feature that cannot exist may be extracted so that correct words cannot be detected as similar words in the vocabulary selection division. If slight deviation of a word feature is allowed in order to prevent omission of detection, many dissimilar words will also be selected resulting in a very large number of words outputted from the apparatus for rough classification of words.
Moreover, to solve the above problem, there is a method of extracting a word feature from a predescribed word image and storing it in the vocabulary storage division. To roughly classify 100,000 words by this method, however, it is necessary to extract features from word images acquired by having 100,000 words described by a very large number of people, and thus it becomes inexecutable.
Therefore, the object of the present invention is to provide an apparatus for rough classification of words solving the above problem and capable of generating a feature of a word stored in the vocabulary storage division from a character code of each word to efficiently select a word, a method of such rough classification of words and a record medium recording a control program thereof.
An apparatus for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, having:
a candidate character selecting device for, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions;
a character recognizing device for converting into character codes the image areas selected by the candidate character selecting device;
a word describing device for generating word description representing the word image by using the character codes converted by the character recognizing device; and
a vocabulary selecting device for checking the word description generated by the word describing device against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
Another apparatus for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, having:
a candidate character selecting device for, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions;
a character recognizing device for converting into character codes the image areas selected by the candidate character selecting device;
a number-of-characters estimating device for estimating the number of characters of the word image in its entirety and estimating the number of characters in the areas generated from the word image;
a word describing device for generating word description representing the word image by using the character codes converted by the character recognizing device and the number of characters in the areas estimated by the number-of-characters estimating device; and
a vocabulary selecting device for selecting vocabulary recorded in the vocabulary storage device by using the estimated number of characters of the word in its entirety and checking the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
Another apparatus for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, having:
a candidate character selecting device for, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions;
a character recognizing device for converting into character codes the image areas selected by the candidate character selecting device;
a number-of-characters estimating device for estimating the number of characters of the word image in its entirety and estimating the number of characters in the areas generated from the entire word image;
a feature describing device for extracting image features of the word image in its entirety and extracting the image features in the areas generated from the entire word image;
a word describing device for generating word description representing the word image by using the character codes, the number of characters in the areas and the graphic features in the areas; and
a vocabulary selecting device for using the estimated number of characters and graphic features of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and checking the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
A further apparatus for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, having:
a candidate character selecting device for, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions;
an uppercase/lowercase determining device for determining whether the word image comprises only uppercase characters or a mixture of uppercase and lowercase characters;
a character recognizing device for, when determined as comprising only the uppercase characters by the uppercase/lowercase determining device, converting into character codes the image areas selected by limiting the character type only to the uppercase, and when determined as the mixture of uppercase and lowercase characters by the device, converting into character codes the image areas selected by targeting all the character types;
a number-of-characters estimating device for estimating the number of characters of the word image in its entirety and estimating the number of characters in the areas generated from the entire word image;
a feature describing device for extracting image features of the word image in its entirety and extracting the image features in the areas generated from the entire word image;
a word describing device for generating word description representing the word image by using the character codes, the number of characters in the areas and the graphic features in the areas; and
a vocabulary selecting device for using the estimated number of characters and graphic features of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and checking the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
A method for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, comprising the steps of, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions, converting the selected image areas into character codes, generating word description representing the word image by using the converted character codes, and checking the generated word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
Another method for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, comprising the steps of, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions, converting the selected image areas into character codes, estimating the number of characters of the word image in its entirety and estimating the number of characters in the areas generated from the word image, generating word description representing the word image by using the converted character codes and the estimated number of characters in the areas, and using the estimated number of characters of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and checking the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
Another method for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, comprising the steps of, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions, converting the selected image areas into character codes, estimating the number of characters of the word image in its entirety and estimating the number of characters in the areas generated from the entire word image, extracting image features of the word image in its entirety and extracting the image features in the areas generated from the entire word image, generating word description representing the word image by using the character codes, the number of characters in the areas and the graphic features in the areas, and using the estimated number of characters and graphic features of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and checking the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
A further method for rough classification of words according to the present invention is one for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, comprising the steps of, of the word image, selecting candidate characters that are image areas conforming to predetermined conditions, determining whether the word image comprises only uppercase characters or a mixture of uppercase and lowercase characters, when determined as comprising only the uppercase characters, converting into character codes the image areas selected by limiting the character type only to the uppercase, and when determined as the mixture of uppercase and lowercase characters, converting into character codes the image areas selected by targeting all the character types, estimating the number of characters of the word image in its entirety and estimating the number of characters in the areas generated from the entire word image, extracting image features of the word image in its entirety and extracting the image features in the areas generated from the entire word image, generating word description representing the word image by using the character codes, the number of characters in the areas and the graphic features in the areas, and using the estimated number of characters and graphic features of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and checking the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
A record medium recording a control program for rough classification of words according to the present invention is one recording the control program for controlling an apparatus for rough classification of words for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, in which the control program causes the apparatus to select, of the word image, candidate characters that are image areas conforming to predetermined conditions, to convert the selected image areas into character codes, to generate word description representing the word image by using the converted character codes, and to check the generated word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
Another record medium recording a control program for rough classification of words according to the present invention is one recording the control program for controlling an apparatus for rough classification of words for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, in which the control program causes the apparatus to select, of the word image, candidate characters that are image areas conforming to predetermined conditions, to convert the selected image areas into character codes, to estimate the number of characters of the word image in its entirety and estimate the number of characters in the areas generated from the word image, to generate word description representing the word image by using the converted character codes and the estimated number of characters in the areas, and to use the estimated number of characters of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and to check the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
Another record medium recording a control program for rough classification of words according to the present invention is one recording the control program for controlling an apparatus for rough classification of words for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, in which the control program causes the apparatus to select, of the word image, candidate characters that are image areas conforming to predetermined conditions, to convert the selected image areas into character codes, to estimate the number of characters of the word image in its entirety and estimate the number of characters in the areas generated from the entire word image, to extract image features of the word image in its entirety and extract the image features in the areas generated from the entire word image, to generate word description representing the word image by using the character codes, the number of characters in the areas and the graphic features in the areas, and to use the estimated number of characters and graphic features of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and to check the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
A further record medium recording a control program for rough classification of words according to the present invention is one recording the control program for controlling an apparatus for rough classification of words for inputting a word image and selecting vocabulary similar to it among the vocabulary stored in a vocabulary storage device in advance, in which the control program causes the apparatus to select, of the word image, candidate characters that are image areas conforming to predetermined conditions, to determine whether the word image comprises only uppercase characters or a mixture of uppercase and lowercase characters, when determined as comprising only the uppercase characters, to convert into character codes the image areas selected by limiting the character type only to the uppercase, and when determined as the mixture of uppercase and lowercase characters, to convert into character codes the image areas selected by targeting all the character types, to estimate the number of characters of the word image in its entirety and estimate the number of characters in the areas generated from the entire word image, to extract image features of the word image in its entirety and extract image features in the areas generated from the entire word image, to generate word description representing the word image by using the character codes, the number of characters in the areas and the graphic features in the areas, and to use the estimated number of characters and graphic features of the word in its entirety to select the vocabulary recorded in the vocabulary storage device and to check the word description against the vocabulary recorded in the vocabulary storage device so as to select and output vocabulary that can be consistently checked.
More specifically, an apparatus for rough classification of words of the present invention has a candidate character selection division for detecting portions likely to be single characters from a word image, a character recognition division for recognizing selected characters, and a number-of-characters estimation division for estimating, from a word image or part of such an image, the number of characters contained therein, as well as word description division for describing candidate characters from the recognized portions of selected candidate characters and estimated number of characters, and a vocabulary selection division for comparing the described results with character codes of the vocabulary recorded in vocabulary storage division to select similar vocabulary.
Thus, as it is not necessary to extract a word feature from a word image in advance, and also, as opposed to selecting similar words by using only a few characters contained in a word, similar words are selected by utilizing selected characters and their positions in a word, unnecessary similar words are not often selected and it becomes possible to select words efficiently.