The present invention relates to a word recognition method for performing word recognition in an optical character reader for optically reading a word that consists of a plurality of characters described on a material targeted for reading. In addition, the present invention relates to a storage medium that stores a word recognition program for causing the word recognition processing.
In general, in an optical character reader, for example, in the case where characters described on a material targeted for reading is read, even if individual character recognition precision is low, one can read such characters precisely by using knowledge of words. Conventionally, a variety of methods have been proposed.
For example, in the invention disclosed in Jpn. Pat. Appln. KOKAI Publication No. 10-177624, a distance (the smaller value of the distance is, the more reliable recognition result is.) is used as a result of character recognition, and an evaluation value of words is obtained by summation of these distances.
In addition, in the invention disclosed in Jpn. Pat. Appln. KOKAI Publication No. 8-167008, candidates of words are narrowed at the stage of character recognition, correlation between each of such narrowed candidates and each word is performed, and an evaluation value of words is obtained with the number of coincident characters.
Further, in disclosure of Japanese Electronics & Communications Society Paper Vol., 52-C, No. 6, June 1969, pages 305 to 312, a posteriori probability is used as an evaluation value of words.
The posteriori probability will be described here.
A probability at which an event (b) occurs is expressed as P (b).
A probability at which an event (b) occurs after an event (a) has occurs is expressed as P (b j a).
A case in which the event (b) occurs irrespective of whether or not the event (a) occurs, P (b|a) is the same as P (b). In contrast, a probability at which the event (b) occurs under the influence of the event (a) after the event (a) has occurred is referred to as posteriori probability, and is expressed as P (b|a).
However, any of these conventional methods is meaningful only when the number of characters in a word is constant. If the number of characters is not constant, these methods cannot be used. Even if they are used, a failure will occur. That is, in the invention disclosed in Jpn. Pat. Appln. KOKAI Publication No. 10-177624, the smaller number of characters is, the smaller evaluation value is. Thus, a word with less characters is prone to be selected.
In addition, in the invention disclosed in Jpn. Pat. Appln. KOKAI Publication No. 8-167008 and in the disclosure of Japanese Electronics & Communications Society paper, it is presumed that the number of characters is constant. When the number of characters is not constant, they cannot be used.
Further, a conventional evaluation function for word recognition fails to consider the ambiguity of word delimiting, the absence of character spacing, noise entry, and the ambiguity of character delimiting.