1. Field of the Invention
The present invention relates to a technique for speech recognition.
2. Description of the Related Art
In speech recognition, there is known a method of calculating a likelihood of recognition words by sharing partial hypotheses of recognition words capable of being shared, generating a tree structured hypothesis, and searching. According to this method, likelihood calculation can be carried out rapidly by sharing hypotheses using a plurality of recognition words, thereby reducing the number of times likelihood calculations and searches are carried out.
At times, there is added to likelihood calculation the language score that represents the ease by which words can be connected to each other and the ease by which words appear. By adopting language scores, e.g., word list candidates that cannot be connected can be effectively deleted, and the accuracy and speed of recognition can be increased.
However, the said tree structured hypothesis has as a problem of the inability to apply language scores at the start of a word. This problem will be described with reference to FIGS. 7A and 7B. FIG. 7A is an example of a hypothesis that is generated per word, while FIG. 7B is an example of a tree structured hypothesis that is shared between a plurality of words. With reference to FIG. 7A, the “a” in a reference number 601, e.g., is a hypothesis that corresponds to a phoneme that makes up a word. In FIG. 7B, “a” is a node of the tree structured hypothesis. A hypothesis 601 is often made up of a plurality of condition hypotheses generated by the HMM (Hidden Malkov Model). Here, in FIG. 7A, because the words are uniquely determined depending on the former hypothesis, the corresponding language score emerging from the former hypothesis can be added to the likelihood calculation. By contrast, in FIG. 7B, since the hypothesis is shared by a plurality of words, it cannot be determined which word's language score should be applied. Consequently, a language score cannot be applied until the hypothesis uniquely determining-words can be determined. In such a case, a hypothesis whose language score is extremely low and which would be deleted under normal circumstances would result in the likelihood calculation up to the point of language score application to be wasted because the language score would not be able to be applied until the hypothesis is no longer shared.
In order to solve this problem, in U.S. Pat. No. 5,946,655 (Japanese Patent Laid-Open No. 7-295587), for each hypothesis, a correction value is applied so that between the words that share the hypothesis, the largest language score is applied to the likelihood. This is shown in FIG. 8. In this diagram, for each hypothesis, a correction score is applied so that, between the words that share the hypothesis, the largest language score is taken as the language lookahead score, which is added to the likelihood. For example, in this diagram, the lead node “h” is assigned a correction score of A, the child node “a” is assigned a correction score of 0, and the child node “e” is assigned a correction score of (D-A). In this way, at the time of likelihood calculation, by adding correction scores at each hypothesis, the likelihood lookahead score shown in the diagram is added to the likelihood. Then, the total correction score sum (i.e., the language lookahead score at the hypothesis of the end of the word) is subtracted at the end of the word, and the correct language score is added. In this way, it is possible to add a language lookahead score to a hypothesis that is shared by multiple words, or to a hypothesis of the lead word, and so efficiency is increased by being able to delete hypothesis at the beginning of a word if the language lookahead score is low.
However, according to the method of U.S. Pat. No. 5,946,655, a correction score is applied at each hypothesis of the tree structure dictionary, and the language score is again added after subtracting the correction score at the end of the word. Thus, there is a problem of increased amount of computation required for updating the language score.
Therefore, in order to enable fast speech recognition, it is desirable to abbreviate, to some extent, the updating of the said language lookahead score, and to be able to add the language lookahead score to the likelihood calculation from the start of the word.