1. Field of the Invention
The present invention relates to a pattern recognizing method in which a pattern string is collectively recognized, and more particularly to a word recognizing apparatus for collectively recognizing a word and the method thereof.
2. Description of the Related Art
Conventional methods of pattern recognition are classified into the following three groups from the viewpoint of character division and extraction.
In the first method, a word is divided and extracted using its image features in units of characters, and the divided and extracted characters are individually recognized. Main image features include the blank and pitch between characters, a histogram obtained by projecting an image in the direction perpendicular to a character string, the circumscribed rectangle of the joint component of pixels, the unevenness of the upper and lower contours of an image, etc.
In the second method, a plurality of division and extraction hypotheses are developed, and each hypothesis is verified using the result of character recognition. In one case, the extraction and division hypothesis can be obtained by moving an observation window in the image, and in the other case, the extraction and division hypothesis can be obtained by using the image features described above. For verification a dynamic programming (DP) is often used to obtain complete consistency.
However, since in the case of a handwritten character string which is written with no restriction, pitch between characters is not uniform and the image features of parts to be extracted are diverse, the method has a problem in that characters cannot be divided and extracted satisfactorily. In the case where characters are searched using the observation window also, characters cannot be handled by a fixed window since pitch is not uniform. However, if the size of the window is made variable, the process time increases greatly.
Furthermore, since the image features of a part to be divided and extracted are peculiar to character types, such as kanji, hiragana, alphabets and numeric characters, the same problem also occurs in the case of a word composed of printed characters when touched characters are separated, if these different types of characters are mixed.
In the third method, a word itself is recognized without dividing the word in units of characters and extracting the characters. According to this method, although the difficult problem of character division and extraction can be avoided, this method has a problem that the number of candidates to be registered in a recognition dictionary in advance increases rapidly compared with the case where each individual character is recognized. Actually, since the size of the dictionary is restricted to a practical level due to memory capacity, only a limited number of words can be registered, and thereby its usage is restricted.