In a speech recognition using Markov models, a Markov model is established for each word. Generally for each Markov model, plurality of states and transitions between the states are defined. For the transitions, occurrence probabilities and output probabilities of labels or symbols are assigned. Unknown speech is then converted into a label string and a probability of each word Markov model outputting the label string is determined based on the transition occurrence probabilities and the label output probabilities assigned to each respective word Markov model. The word Markov model having the highest probability of producing the label string is determined. The recognition is performed according to this result. In speech recognition using Markov models, the occurrence probabilities and label output probabilities (i.e., "parameters") can be estimated statistically.
The details of the above recognition technique are described in the following articles.
(1) "A Maximum Likelihood Approach to Continuous Speech Recognition" (IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-Vol.5, No. 2, pp. 179-190, 1983, Lalit R. Bahl, Frederick Jelinek and Robert L. Mercer)
(2) "Continuous Speech Recognition by Statistical Methods" (Proceedings of the IEEE vol. 64, 1976, pp. 532-556, Frederick Jelinek)
(3) "An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition" (The Bell System Technical Journal vol. 62, No. 4, 1983, pp. 1035-1074, April, S. E. Levinson, L. R. Rabiner and M. M. Sondhi)
Speech recognition using Markov models generally needs a tremendous amount of speech data and the training thereof requires much time. Furthermore, a system trained with a certain speaker often does not get sufficient recognition scores for other speakers. And even for the same speaker, when there is a long time between the training and the recognition, there difference between the two events may result in poor recognition.