1. Field of the Invention
The invention relates to a post-process for the purpose of improvement of recognition precision of characters.
The invention intends to select a proper character from character recognition candidates by using a chain probability of a plurality of characters which are continuously inputted.
2. Related Background Art
Among conventional character recognizing apparatuses, there is an apparatus comprising: a pattern matching section for comparing an inputted unknown character pattern with standard patterns which have been prepared as a recognition dictionary in the apparatus, thereby selecting a character code of the standard pattern having high similarity; and a post-processing section for performing a word collating process, a context process, and the like by using recognition candidates obtained from the pattern matching section, thereby outputting a most probable recognition result as a character train.
As a post-process using the context process, an N-gram statistic process to which a chain probability of each character in a character train is applied can be mentioned. The N-gram statistic process uses the chain probability of the following character when a certain character train is given. Particularly, the N-gram statistic process is called a Bi-gram statistic process when the given character train is constructed by two characters and is called a Tri-gram statistic process when it is constructed by three characters.
For example, the Bi-gram statistic process is generally reflected to an on-line character recognition post-process in the following manner.
When the user inputs xe2x80x9cxixe2x80x9d ("xgr"), first, the handwritings of xe2x80x9cxxe2x80x9d and xe2x80x9cixe2x80x9d are matching processed by the pattern matching section which has a dictionary in which a standard pattern of each character has been stored and discriminates the similar character every character in accordance with a shape of the input pattern. It is now assumed that xe2x80x9cxxe2x80x9d and xe2x80x9cyxe2x80x9d were selected for one input pattern xe2x80x9cxxe2x80x9d and xe2x80x9c;xe2x80x9d and xe2x80x9cixe2x80x9d were selected for one input pattern xe2x80x9cixe2x80x9d as recognition candidates in accordance with the order from the candidate of high similarity for each input pattern and they were outputted as candidate characters, respectively.
Subsequently, all of the possible combinations of the respective recognition candidates are formed. In this example, four combinations of xe2x80x9cx;xe2x80x9d, xe2x80x9cxixe2x80x9d, xe2x80x9cy;xe2x80x9d, and xe2x80x9cyixe2x80x9d exist. Among them, however, the combination in which the chain probability due to the Bi-gram statistic process using the Bi-gram statistic data which has previously been formed is the highest among those four character trains is xe2x80x9cxixe2x80x9d. Therefore, a final recognition result is outputted as xe2x80x9cxixe2x80x9d.
In case of executing the N-gram statistic process as a post-process as mentioned above, it is necessary to preliminarily calculate N-gram statistics data by using sample texts such as newspapers and the like, to store the chain probabilities of the characters derived from the calculated N-gram statistics into the recognizing apparatus as an N-gram dictionary in a format of a file or the like, and to read out and use the chain probabilities at the time of the execution of the recognition.
In case of using the Bi-gram statistic process in the N-gram statistic process of the above conventional character recognizing apparatus, a backward-chain probability such that attention is paid to a certain character and at which probability a character subsequent to the target character occurs is applied. In case of applying the Bi-gram statistic process to the character recognition, however, there is a case where an enough backward-processing effect cannot be obtained so long as only the backward-chain probability is used. For example, it is now assumed that recognition results of three characters of xe2x80x9cxe2x80x9d are xe2x80x9cxe2x80x9d, xe2x80x9cxe2x80x9d, and xe2x80x9c∘Oxe2x80x9d in accordance with the order of similarity, respectively. From those candidates, when the Bi-gram statistics are applied, a chain probability of xe2x80x9cxe2x80x9d is the largest in case of the combination of the first and second characters. A chain probability of xe2x80x9cIOxe2x80x9d is the largest in case of the combination of the second and third characters. Since an operation value of xe2x80x9cIxe2x80x9d upon pattern matching is better than that of xe2x80x9cxe2x80x9d, the result of xe2x80x9cIOxe2x80x9d is finally outputted. According to this result, the number of times of erroneous recognition is larger than that of the recognition result at the time of the pattern matching. There is a problem such that a recognition rate is deteriorated by the post-processing step as mentioned above.
Similarly, three character patterns of xe2x80x9cC∘.xe2x80x9d are inputted and each of them is character recognized. Thus, it is now assumed that upper recognition candidate characters of the first pattern are xe2x80x9cCxe2x80x9d and xe2x80x9ccxe2x80x9d, upper recognition candidate characters of the second pattern are xe2x80x9clxe2x80x9d, xe2x80x9c∘xe2x80x9d, and xe2x80x9cOxe2x80x9d, and upper recognition candidate characters of the third pattern are xe2x80x9c.xe2x80x9d and xe2x80x9c∘xe2x80x9d, respectively. When the Bi-gram statistics are applied to those candidates, a chain probability of xe2x80x9cC∘xe2x80x9d is the highest in case of the combination of the first and second patterns and a chain probability of xe2x80x9cl∘xe2x80x9d is the highest in case of the combination of the second and third patterns. Since a similarity operation value of xe2x80x9clxe2x80x9d upon pattern matching is better than that of xe2x80x9c∘xe2x80x9d, a character train of xe2x80x9cCl∘xe2x80x9d is finally outputted as a recognition result. According to this result, the number of time of erroneous recognition is larger than that in case of outputting the first candidate character upon pattern matching without performing a post-processing.
The invention is made to solve the above problems and it is an object of the invention to provide character recognizing apparatus and method for realizing the improvement of a recognition rate by further applying a forward-chain probability in addition to a backward-chain probability in a Bi-gram statistic process.
To accomplish the above object, according to claim 1 of the invention, there is provided a character recognizing apparatus for recognizing a plurality of characters by applying a chain probability of a character, comprising: backward-chain probability applying means for applying the chain probability from the i-th character among the plurality of characters to the (i+1)th character; forward-chain probability applying means for applying the chain probability from the (i+1)th character among the plurality of characters to the i-th character; unifying means for unifying results which are respectively obtained from the backward-chain probability applying means and the forward-chain probability applying means and setting a unified result as a post-processing result; and output means for outputting the post-processing result unified by the unifying means as a final recognition result.
According to the invention, by applying the forward-chain probability in addition to the backward-chain probability, the erroneous recognition of a character train which cannot be saved so long as only the backward-chain probability is used can be improved and the recognition rate can be improved. The character train which is displayed as a final recognition result displays a natural result as a sentence that is better than the result so far. There is, consequently, an effect that even if an erroneous recognition character exists, an anxious factor for the erroneous recognition of the user is reduced. Since the post-processing system using a strong restriction between the characters is adopted, the invention effectively functions in a special field or in a case where a range of characters as recognition targets is limited or the like.