1. Field of the Invention
The present invention relates to a model adaptation apparatus, a model adaptation method, a storage medium, and a pattern recognition apparatus, and more particularly, to a model adaptation apparatus, a model adaptation method, a storage medium, and a pattern recognition apparatus, which are suitable for use in speech recognition or the like.
2. Description of the Related Art
Methods of recognizing a word or the like uttered in a noisy environment are known. Representative examples thereof include a PMC (Parallel Model Combination) method, a SS/NSS (Spectral Subtraction/Nonlinear Spectral Subtraction) method, and a SFE (Stochastic Feature Extraction) method.
The advantage of the PMC method is that information of ambient noise is directly incorporated in an acoustic model and thus high recognition performance can be achieved. However, the disadvantage is high calculation cost. That is, in the PMC method, to perform complicated calculation, a large-scale apparatus and a long processing time are needed. On the other hand, in the SS/NSS method, ambient noise is removed when a feature value of voice data is extracted. Therefore, the SS/NSS method needs lower calculation cost than is needed in the PMC method and thus this method is now widely used in the art. In the SFE method, although ambient noise is removed when a feature value of voice data is extracted, as in the SS/NSS method, the extracted feature value is represented by a probability distribution. Thus, the SFE method differs from the SS/NSS method or the PMC method in that the SFE method extracts the feature value of voice in the form of a distribution in the feature space while the SS/NSS method and the PMC method extract the feature value of voice in the form of a point in the feature space.
In any method described above, after extracting the feature value of the voice, it is determined which one of acoustic models corresponding to registered words or the like best matches the feature value, and a word corresponding to the best matching acoustic model is employed and output as a recognition result.
A detailed description of the SFE method may be found, for example, in Japanese Unexamined Patent Application Publication No. 11-133992 (Japanese Patent Application No. 9-300979) which has been filed by the applicant for the present invention. Discussions on the performance of the PMC method, the SS/NSS method, and the SFE method may be found, for example, in the following papers: H. Pao, H. Honda, K. Minamino, M. Omote, H. Ogawa and N. Iwahashi, “Stochastic Feature Extraction for Improving Noise Robustness in Speech Recognition”, Proceedings of the 8th Sony Research Forum, SRF98–234, pp. 9–14, October 1998; N. Iwahashi, H. Pa, H. Honda, K. Minamin and M. Omote, “Stochastic Features for Noise Robust in Speech Recognition”, ICASSP'98 Proceedings, pp. 633–636, May, 1998; N. Iwahashi, H. Pao (presented), H. Honda, K. Minamin and M. Omote, “Noise Robust Speech Recognition Using Stochastic Representation of Features”, ASJ'98—Spring Proceedings, pp. 91–92, March, 1998; N. Iwahashi, H. Pao, H. Honda, K. Minamino and M. Omote, “Stochastic Representation of Feature for Noise Robust Speech Recognition”, Technical Report of IEICE, pp. 19–24, SP97–97(1998–01).
A problem with the above-described SFE method or similar methods is that degradation in recognition performance can occur because ambient noise is not directly reflected in speech recognition, that is, because information of ambient noise is not directly incorporated in an acoustic model.
Furthermore, because information of ambient noise is not directly incorporated in the acoustic model, degradation in the recognition performance becomes more serious as the time period from the start of speech recognition operation to the start of utterance becomes longer.