Character recognition is typically implemented in the three stages, preprocessing, feature extraction, and discrimination. In the preprocessing stage, size normalization of the input character pattern and noise removal are normally performed.
During the feature extraction stage, multiple feature values that represent the features of each input character are extracted from the input character pattern and a feature vector representing the feature values is generated. Each feature of the input character represents a portion of the structure of the input character. Typical features include the length of stroke, the angle of stroke, and the number of loops. For example, when the feature is the number of loops, the feature value may have one the following values:
0: when the input character is the numeral "1", "2" or "3," PA1 1: when the input character is the numeral "0", "6" or "9," and PA1 2: when the input character is the numeral "8."
Typically many hundreds of feature values are extracted for each input character in the input character pattern. The feature values are represented by a feature vector whose elements each represent the feature value of one of the features of the input character. A feature vector has a large number of dimensions, with 500 dimensions being typical.
In the discrimination stage, the feature vector of each input character in the input character pattern is compared with a reference vector for each category. The input character is determined to belong to the category whose reference vector is closest to the feature vector of the input character. In character recognition, each "category" represents one character. For example, in numeral recognition, a category exists for each of the characters "0," "1," . . . , "9."
The effectiveness of a character recognition system is characterized by its "recognition ratio." When character recognition is performed, one of the following results is obtained for each input character in the input character pattern: (1) the category to which the input character belongs is correctly recognized; (2) the input character is successfully recognized as belonging to a category, but the category is incorrect; or (3) the input character is not recognized as belonging to any category. For example, when the input character is the numeral "1," result (1) occurs when the input character is recognized as belonging to the category "1;" result (2) occurs when the input character is erroneously recognized as belonging to the category "7," for example, and result (3) occurs when the category to which the input character belongs cannot be recognized. The recognition ratio is the number of character recognition events that generate result (1) divided by the total number of input characters in the input character pattern. A successful character recognition system is one that has a recognition ratio close to unity (or 100%).
The reference vectors are stored in a recognition dictionary. The recognition dictionary is statistically created from character patterns obtained from the handwriting of many people. Before the character recognition system can be used for handwriting recognition, the recognition dictionary is created by a number of unspecified writers each handwriting a predetermined set of characters. The category to which each of the characters in the set belongs is known. The feature vectors extracted from the characters in each category are averaged and each average vector is stored in the recognition dictionary as the reference vector for the category.
Because the recognition dictionary just described is created from the handwriting of unspecified writers, this type of recognition dictionary can be regarded as a universal recognition dictionary that can be used to perform character recognition on the writing of any writer. However, because of the stylistic differences between writers, the recognition ratio of a character recognition system employing a universal recognition dictionary will depend greatly on how closely each writer's style matches the average represented by the reference vectors stored in the universal recognition dictionary.
It is known in the prior art to improve the recognition ratio of a character recognition system by requiring each of the writers whose handwriting is to be recognized by the system to hand write a set of predetermined characters to create a personal recognition dictionary. However, the requirement that each writer hand write a set of predetermined characters before character recognition is performed is impractical in a character-recognition system designed to recognize the handwriting of many different writers.
Although a character recognition system for handwriting must tolerate the variations in characters that result from the system being used by different writers, these variations are also a primary factor that hinders improving the recognition ratio of such systems. For example, if the characters in one category written by one writer resemble the characters in another category written by another writer, accurate character recognition of the handwriting of both writers will be extremely difficult if the same recognition dictionary is used. To solve this problem, as noted above, conventional prior-art systems store a personal recognition dictionary for each writer whose handwriting will be recognized by the system. The personal recognition dictionary is created by requiring the writer to hand write a predetermined set of characters before the system performs character recognition on the writer's handwriting.
The document Improving Handwritten Character Recognition Using Personal Writing Characteristics, TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, Vol. J78-D-II, No. 7, July 1995, discloses methods for improving character recognition of handwritten characters when it is not feasible for the person using the system to hand write a predetermined set of characters before the system performs character recognition on the writer's handwriting. See also T. Kawatani, Character Recognition Performance Improvement Using Personal Handwriting Characteristics, IEEE 0-8186-7128-9/95 (1995); and T. Kawatani, N. Miyamoto, Verification of Personal Handwriting Characteristicsfor Numerals and its Application to Recognition, 14 PATTERN RECOGNITION LETTERS, pp. 335-343 (1993). These papers describe system in which the number of input characters that are erroneously recognized (result (2) above) is reduced, but the techniques described do not necessarily provide in an improvement of the recognition ratio (result (1) above).
Thus, the development of a character recognition apparatus and method having an improved recognition ratio would constitute a major technological advance. The ability to improve the recognition ratio without requiring that special operations be performed before character recognition is performed on the handwriting of a new writer would constitute a further technological advance.