A so-called character recognition technology has been utilized. An apparatus that performs a character recognition receives a handwriting input of a character input by a user using an input device (e.g., a touch panel or a mouse) and estimates a character code of the character intended to be input by the user. When a plurality of candidates exist for the character codes, the apparatus may prompt the user to select one of the candidates. One of the issues in the character recognition technology is that a recognition accuracy of the character needs to be improved.
For example, there is a suggested method in which a plurality of input frames are formed in an input area of a character and a plurality of components (e.g., , , ) obtained by dividing a single Chinese character is input into each input frame. In the suggested technology, a correspondence of a set of components and a character is stored in a storage module in advance to compare the set of components input in each input frame and the set of components stored in the storage module to display a character containing the set of components input in the frame in a display apparatus.
Further, there is another suggested method in which a stroke data input in handwriting is generated in conjunction with a stroke data input previously for every one stroke indicating a line made by a single movement in handwriting a character input in handwriting and the stroke data is compared with a dictionary character data made up of the stroke data prepared in advance to execute a character recognition while determining a similarity between stroke data input in handwriting and the dictionary character data. When the similarity of the stroke data input in handwriting to the dictionary character data becomes zero, the stroke data input in handwriting generated through a handwriting input of one stroke before is segmented as a single character and the dictionary character data containing the stroke data input in handwriting is displayed as a candidate character.
See, for example, Japanese Laid-Open Patent Publication No. 07-121660 and Japanese Laid-Open Patent Publication No. 11-134437.
In the methods described above, the dictionary data utilized for collating with the set of components or the stroke data are prepared in advance for each target character to be recognized. However, the number of characters is enormous and thus, it is not easy to collect all of characters in the dictionary data. For example, a preparation work for the dictionary data requires times for, for example, extracting plural stroke information that become samples for each character from multiple users and registering the stroke information in the dictionary data. Therefore, some characters may not be registered in the dictionary data. For example, some characters, such as a variant character or an external character, having a low use frequency may not be registered in the dictionary data. Accordingly, how to recognize the characters unregistered in the dictionary data with high accuracy is in question.