1. Field of the Invention
The present invention relates to a model adaptive apparatus and a model adaptive method, a recording medium, and a pattern recognition apparatus. More particularly, the present invention relates to a model adaptive apparatus and a model adaptive method, a recording medium, and a pattern recognition apparatus, which are suitable for use in a case in which, for example, speech recognition is performed.
2. Description of the Related Art
There have hitherto been known methods of recognizing words which are spoken in a noisy environment. Typical methods thereof are a PMC (Parallel Model Combination) method, an SS/NSS (Spectral Subtraction/Nonlinear Spectral Subtraction) method, an SFE (Stochastic Feature Extraction) method, etc.
The PMC method has satisfactory recognition performance because information on environmental noise is taken directly into a sound model, but calculation costs are high (since high-level computations are necessary, the apparatus is large, processing takes a long time, etc.). In the SS/NSS method, at a stage in which features of speech data are extracted, environmental noise is removed. Therefore, the SS/NSS method has a lower calculation cost than that of the PMC method and is widely used at the present time. In the SFE method, in a manner similar to the SS/NSS method, at a stage in which features of a speech signal containing environmental noise are extracted, the environmental noise is removed, and as features, those represented by a probability distribution are extracted. The SFE method, as described above, differs from the SS/NSS method and the PMC method in which the features of speech are extracted as a point on the feature space, in that the features of speech are extracted as a distribution in the feature space.
In each of the above-described methods, after the extraction of the features of speech, it is determined which one of the sound models corresponding to plural words, which are registered in advance, the features match best, and the word corresponding to the sound model which matches best is output as a recognition result.
The details of the SFE method are described in Japanese Unexamined Patent Application Publication No. 11-133992 (Japanese Patent Application No. 9-300979), etc., which was previously submitted by the applicant of this application. Furthermore, the details of the performance comparisons, etc., among the PMC method, the SS/NSS method, and the SFE method are described in, for example, “H. Pao, H. Honda, K. Minamino, M. Omote, H. Ogawa and N. Iwahashi, Stochastic Feature Extraction for Improving Noise Robustness in Speech Recognition, Proceedings of the 8th Sony Research Forum, SRF98-234, pp. 9-14, October 1998”; “N. Iwahashi, H. Pao, H. Honda, K. Minamino, and M. Omote, Stochastic Features for Noise Robust in Speech Recognition, ICASSP'98 Proceedings, pp. 633-636, May 1998”; “N. Iwahashi, H. Pao (presenter), H. Honda, K. Minamino and M. Omote, Noise Robust Speech Recognition Using Stochastic Representation of Features, ASJ'98-Spring Proceedings, pp. 91-92, March 1998”; “N. Iwahashi, H. Pao, H. Honda, K. Minamino and M. Omote, Stochastic Representation of Features for Noise Robust Speech Recognition, Technical Report of IEICE, pp. 19-24, SP97-97 (1998-01); etc.
In the above-described SFE method, etc., environmental noise is not taken into account directly at the stage of speech recognition, that is, information of environmental noise is not input directly into a no-speech sound model, causing a problem of inferior recognition performance to occur.
Furthermore, due to the fact that information on environmental noise is not taken directly into a no-speech sound model, there is another problem in that recognition performance is decreased as the time from the start of the speech recognition until the start of speech production is increased.