Field of the Invention
The present invention relates to a voice processing apparatus and a voice processing method.
Description of Related Art
In general, a voice recognition process includes a process in which a predetermined utterance feature amount is acquired from an acoustic signal input from a microphone and utterance content is specified using the utterance feature amount and a predetermined statistical model.
For example, a Mel-frequency Cepstrum coefficient (MFCC), a Mel-frequency Log Spectrum (MFLS), etc. are used as the utterance feature amount in some cases. A sound received through a microphone includes a sound in which a variety of noises such as reverberation, background noise, etc. are superimposed on a voice (a clean voice) uttered by a speaker in some cases. A voice recognition rate is reduced if an utterance feature amount acquired on the basis of an acoustic signal on which the noises are superimposed is used.
Thus, performance of a voice recognition process using an average spectrum obtained by averaging spectra for each frame before the utterance feature amount is calculated is suggested to reduce an influence of noise. For example, the voice recognition apparatus disclosed in Japanese Unexamined Patent Application, First Publication No. 2000-172291 (hereinafter referred to as Patent Literature 1), includes calculating a power spectrum of audio data, determining an acoustic model by calculating an average spectrum at the time of non-recognition of a voice, and recognizing each word of the voice according to the determined acoustic model of the power spectrum at the time of recognition of the voice.