1. Field of Invention
The present invention relates to a method of calculating an HMM (Hidden Markov Model) output probability to calculate an output probability in a discrete HMM. The invention also relates to a speech recognition apparatus that employs the method of calculating an HMM output probability.
2. Description of Related Art
HMMs are widely used as phoneme models for speech recognition. Although speech recognition based on HMMs achieves a high recognition rate, disadvantageously, the amount of computation is large. In particular, calculation of an HMM output probability requires a large amount of computation, and accordingly, problems exist, such as a large memory space being required for computation.
A related art method, which is disclosed in “Speech Recognition Based on Hidden Markov Model”, Journal of Acoustic Society of Japan, Vol. 42, No. 12 (1986), addresses or overcomes these problems. In this method, a feature vector sequence obtained by speech analysis of an input speech is vector quantized using a codebook created in advance, codes (labels) obtained are input to HMMs (e.g., phoneme HMMs) constituting each word and state output probabilities are obtained by table reference, and likelihoods obtained from the respective HMMs are compared with each other to recognize the speech.
In the method of speech recognition that employs vector quantization, a process for obtaining output probabilities of respective states by table reference is executed as follows.
An input speech is analyzed in a speech analysis unit in each predetermined period, obtaining a feature vector sequence Vt (t is a frame number of input speech segmented on the basis of predetermined period, and t=1, 2, . . . , T, where T denotes the number of frames) constituted of, for example, LPC cepstrum coefficients of a dimension on the order of ten to twenty. The feature vector sequence Vt is quantized in a vector quantization unit using a codebook, outputting a code sequence Ct for each frame number (t=1, 2, . . . ).
The codebook used herein is created from speech data including every phoneme, and denoting the codebook size by K, the value of a code Ct associated with a frame number can be represented by a codebook label number k (k=1, 2, . . . , K).
Now, denoting the probability of outputting a code Ct, in response to inputting the code Ct, in a transition from a state i to another state j in an HMM of a phoneme by bij (Ct), since the codebook size is K and the value of a code Ct associated with a frame number can be represented by the codebook label number k (k=1, 2, . . . , K) in this case, output probabilities bij (k) of outputting the label numbers 1 to K are to be obtained.
Thus, by storing in tables the probability of outputting the label number 1, the probability of outputting the label number 2, . . . , and the probability of outputting the label number K in each state transition of each phoneme HMM, output probabilities in each state transition can be obtained only by table reference based on the label numbers.