1. Field of the Invention
The present invention relates to word recognition in a character recognition device.
2. Description of the Related Art
In recent years, the demand for character recognition device OCR or software OCR has been increasing.
Word recognition is a method with which individual characters are not recognized by separating a handwritten word such as  into individual characters when a handwritten word is recognized, but the word itself is collectively recognized. With this method, recognition with high accuracy can be implemented even if characters are in contact. This is one of effective methods for recognizing a handwritten character string in a free pitch region. A word recognition device according to the present invention is applicable not only to a handwritten character recognition device, but also to a character recognition device in a broad sense, such as a printed character recognition device, a character recognition device of a portable information terminal, etc.
As a method recognizing a handwritten word by generating a word feature dictionary for a comparison with the synthesis of the features of characters structuring a word, and by making a comparison with the feature of an input word, for example, the methods recited by Japanese Patent Application Nos. 11-113733, 11-330288, etc. have been proposed.
The invention disclosed by the above described application No. 11-113733 is intended to collectively recognize an input word image without recognizing the individual characters structuring the input word image, after a word feature dictionary is generated based on the features of individual characters. With this method, word recognition can be performed with high accuracy by using an individual character image dictionary of a small capacity.
Additionally, the invention disclosed by the above described application No. 11-330288 is intended to be able to cope with a change in a character shape of an input word image by generating a word dictionary with the synthesis of a plurality of word features for one word.
If a character feature dictionary for synthesizing a word feature is arranged, features are extracted from a character image the position or width of which is changed for each character, and all the features are held, according to the conventional method disclosed by Japanese Patent Application No.11-330288.
For example, as shown in FIG. 1, features of horizontal widths 1/6, 2/6, . . . , 6/6 (hereinafter referred to as p/q features) are extracted, and all of the extracted features are held. In this case, the number of features per character is 21 (a calculation expression: q(q+1)/2).
As a feature of an individual character, for example, a weighted direction code histogram feature (see “Improvement of handwritten Japanese Character Recognition Using Weighted Direction Code Histogram, Pattern Recognition”, Tsuruoka et al., the IEICE Transactions D Vol. J70-D No. 7, pp.1390-1397, July 1987) is used. The weighted direction code histogram feature is a feature such that the direction code histogram of each of small regions, into which a character image are partitioned, is regarded as a feature vector. By way of example, as shown in FIG. 2, feature amounts are extracted in 8 directions obtained by dividing 360° by 8 within 7 (length)×7 (width) meshes. Each of the meshes possesses 8-directional dimension feature amounts. For example, a 3/7 feature of a character  is shown in FIG. 2.
If a word feature is synthesized, it is synthesized so that a sum of p/q fractions of individual character features results in 1. By way of example, for a word composed of two characters, a word feature is synthesized by adding “a 3/7 feature+a 4/7 feature”, “a 2/7 feature+a 5/7 feature”, etc. For instance, if the features of a word  is synthesized, the 3/7 feature of  and the 4/7 feature of  are added, so that  is synthesized, shown in FIG. 3.
However, since character features the positions and widths of which are changed must be held for all of character categories of approximately 4,000, a capacity of several hundred M bytes are required, which is a serious problem from a practical viewpoint.