1. Field of the Invention
It is an object of the invention to perform a speech recognition by using a Hidden Markov Model (HMM).
Another object of the invention is to remove additive noise from an input speech.
2. Related Background Art
In the case of performing speech recognition in a real environment, there is a problem of noise as one of large problems. Although the noise is an additive noise which is additively added to spectrum characteristics, there is a Parallel Model Combination (PMC) method as a method which is effective for the additive noise.
The PMC method has been described in detail in M. J. Gales and S. Young, "An Improved Approach to the Hidden Markov Model Decomposition of Speech and Noise", Proc. of ICASSP'92, I-233-236, 1992.
The PMC method is a method of adding and synthesizing an HMM (speech HMM) learned by speech collected and recorded in a noiseless environment and an HMM (noise HMM) learned by noise, thereby approaching a model to a noise superimposed environment and executing a conversion to add a noise to all of the models. In a noise process in the PMC, it is presumed that additiveness of noise and speech is established in a linear spectrum region. On the other hand, in the HMM, parameters of a logarithm spectrum system, such as a cepstrum and the like, are often used as a characteristic amount of the speech. According to the PMC method, those parameters are converted into the linear spectrum region and are added and synthesized in the linear spectrum region of the characteristic amount, which is derived from the speech HMM and noise HMM. After the speech and the noise were synthesized, an inverse conversion is performed to return the synthesized value from the linear spectrum region to the cepstrum region, thereby obtaining a noise superimposed speech HMM.
By using the foregoing PMC method, it is possible to cope with additive noises such as internal noise, background noise, and the like. However, the PMC method has problems such that since a nonlinear conversion is executed to all of the models, the amount of calculations is large, the processing time is very long, and it is not suitable for an instantaneous environment adaptation in which an adaptation to noise is performed simultaneously with recognition.