The present invention relates to the improvement of a continuous speech recognition unit for recognizing continuous speech composed of continuously uttered words.
Conventionally, a known speech recognition method, is a speech recognition method by "Hidden Markov Model" (hereinafter referred to as HMM) a described in Stephen E. Levinson, "Structural Methods in Automatic Speech Recognition", Proceedings of the IEEE, Vol. 73, No. 11, November 1985" (hereinafter referred to as "literature 1") page 1633. In this method, first the generation phase of speech patterns is modelled as the state transition model by the Markov process. This state transition model is HMM. Speech recognition of the observed speech patterns is performed by determining observation probabilities from this HMM.
Let us consider the case where words are recognized by using this process. In this case, first, a HMM is formed for each word to be recognized. This method for forming HMM is fully described in the above-mentioned "literature 138 , page 1633. When a speech pattern is inputted to the speech recognition result unit, the observation probability for each HMM is computed, and a recognition result is obtained as a word for the HMM which gives the highest observation probability. This observation probabilities can be also considered as the similarity between the speech pattern and each HMM, in which HMM is equivalent to the standard pattern. The observation probabilities for HMM can be obtained by the forward algorithm or the Baum algorithm as described in the above-mentioned "literature 1", page 1634.
Further the HMM allows continuous speech patterns composed of continuously uttered word to be recognized. As an example of continuous speech recognition, the case where units of recognition are words is explained. However, any recognition such as vocal sound can be similarly treated. The continuous speech recognition in the case where units of recognition are words can be achieved by means of the Viterbi algorithm as described in the above-mentioned "literature 1", page 1635.
The Viterbi algorithm used for continuous speech recognition is an approximation method in which the observation probabilities of words can be obtained from the product of probabilities on a matching pass which is defined as a trace associated with the correspondence between points in time of two patterns. Therefore, the Viterbi algorithm has a disadvantage that the recognition rate thereof is generally low compared with the forward algorithm in which the observation probabilities for words can be obtained from probabilities on all possible matching passes.
On the other hand, in the forward algorithm, the matching pass giving the maximum probability cannot be uniquely determined. Therefore, the forward algorithm has a disadvantage that a recognition result cannot be obtained unless computations are performed for all combinations of word sequences in a round robin manner when performing continuous speech recognition.