1. Field of the Invention
The present invention relates to a probability density function compensation method and a speech recognition method and apparatus, and particularly, to a probability density function compensation method used for a continuous hidden Markov model and a speech recognition method and apparatus using the same.
2. Description of the Related Art
Generally, speech recognition starts by processing extracted feature vectors required for recognition and voice waveforms from an input voice. Secondly, recognizing or decoding is performed by using the so-called hidden Markov Model (HMM), which is a phoneme-level statistical model. A word-level acoustic model is formed by concatenating phone (such as a vowel or consonant)-level models in accordance with a pronunciation lexicon.
The HMM has been widely adopted for speech recognition because of its great modeling flexibility and high performance. In speech recognition, the HMM hides temporal states of a vocal organ and phonemes generated by the vocal organ and sets observed speech to an output to estimate the vocal organ state and phonemes.
The HMM is a double process represented by a state transition probability and an output probability. The state transition probability can be obtained by a Markov process. The output probability can be represented in three types. In the first type, the output probability is represented with codewords in VQ-based codebook which is obtained by a vector quantization (VQ), which means that all of the available acoustic features are represented with a discrete probability density function in the VQ-based codebook. In the second type, all of the available acoustic features can be represented with a continuous probability density function. The continuous probability density function greatly depends on acoustic units, because a spectral mean and standard deviation of several voice feature vectors are obtained with a voice. In the third type, the first and second types are combined.
In a discrete HMM (DHMM), an observation symbol, that is, a voice feature vector, is represented as the most approximate codeword through vector quantization. Therefore, there exists a possibility of several quantization errors. A continuous HMM (CHMM) is proposed in order to remove the quantization errors. However, the CHMM has not been widely used for speech recognition due to the following reasons. Firstly, there are a large number of model parameters to be estimated. In order to estimate the large number of model parameters, a large database and large amounts of calculations are needed. Secondly, the CHMM is sensitive to initial values. Therefore, an automatic speech recognition system is not suitable for a mobile phone having a small capacity of resources. Accordingly, it is necessary to reduce memory size and amounts of calculations for automatic speech recognition.