1. Field of the Invention
The present invention relates to a speech recognition apparatus and method and a program therefor, which perform collation processing by using a plurality of acoustic models.
2. Description of the Related Art
It is important for speech recognition to exhibit good performance with respect to speakers and utterance environments (ambient noise, classifications, SNRs, and the like). With regard to this point, there is known a technique of classifying speakers and utterance environments into a plurality of clusters, preparing acoustic models for the respective clusters, executing recognition processing by using each of these acoustic models, integrating a plurality of recognition processing results, and outputting the recognition result (see, for example, Shinozaki et al., “Spontaneous speech recognition using Massively Parallel Decoder”, Proceedings of 2004 Spring Meeting of Acoustic Society of Japan, Feb. 11, 2006, pp. 111-112, March 2004). This technique performs recognition processing by using acoustic models corresponding to clusters classified according to speakers or utterance environments, and integrates the processing results. This can be expected to improve performance with respect to variations in speakers and utterance environments.
According to the conventional technique, however, acoustic models are prepared for the respective clusters, and recognition processing is executed by using the respective acoustic models. This increases the calculation cost required for recognition processing as compared with a case wherein one recognition process is executed by using one acoustic model. If, for example, N acoustic models are prepared for N clusters, respectively, and N recognition processes respectively using N acoustic models are executed, the calculation cost for the recognition processes increases N times. This poses a serious problem in an apparatus using speech recognition.