1. Field of the Invention
The present invention relates to a technique for controlling the input gain of an audio signal in a speech recognition system.
2. Background Art
A technique for controlling the input gain of an audio signal in a speech recognition system, such as the one disclosed in Japanese Patent No. 5614767, is known. The technique involves learning a statistical distribution of peak values of input audio signals, and setting the input gain of an audio signal such that when an audio signal is input with a distribution equal to the learned distribution, the magnitude of the audio signal after amplification falls within the input range of the speech recognition system as much as possible.
Another technique for controlling the input gain of an audio signal in a speech recognition system, such as the one disclosed in Japanese Patent No. 5457293, is also known. The technique involves calculating the mean amplitude distribution of spoken voice components contained in audio signals while repeatedly calculating the amplitude distribution of noise, and setting the input gain of an audio signal, in accordance with the mean amplitude distribution of spoken voice components and the previously calculated amplitude distribution of noise, such that the magnitude of an audio signal input to the speech recognition system falls within a proper range.
In the technique of setting the input gain of an audio signal in accordance with the learned statistical distribution of peak values of input audio signals, the input gain of the audio signal cannot be appropriately set until the peak values of a statistically significant number of audio signals can be acquired. Similarly, in the technique of setting the input gain of an audio signal in accordance with the mean amplitude distribution of spoken voice components and the previously calculated amplitude distribution of noise, the input gain of the audio signal cannot be appropriately set until the mean amplitude distribution of spoken voice components is calculated by acquiring a statistically significant number of audio signals.
This means that using these techniques may lead to frequent failures in speech recognition immediately after the start of use of the speech recognition system, and the user may be discouraged from using the speech recognition system.
Additionally, these techniques are disadvantageous in that they require a relatively complex configuration, such as a configuration for learning a statistical distribution of peak values of audio signals, or a configuration for repeatedly calculating the amplitude distribution of noise and calculating the mean amplitude distribution of the magnitude of spoken voice components contained in audio signals.