In various applications such as, speech recognition and automatic teleconferencing, speech signals may be corrupted by noises which can include Gaussian noise, speech noise (unrelated conversations), and reverberation. Automatic Speech Recognition (ASR) systems are known for recognizing spoken words in audio signals. ASR technology enables microphone-equipped computing devices to interpret speech and thereby provide an alternative to human-to-computer input devices such as keyboards or keypads.
ASR accuracy degrades in noisy conditions. For instance, if there is a radio playing or people are talking in the background while the user speaks to the machine, the output of the automatic speech recognizer contains much more errors than the output derived with a silent background. In such environments, speech recognition is difficult because the signal-to-noise ratio can be insufficient. Moreover the noise model of the environment is not known and it can change depending environmental conditions, e.g., wind, music, competing background conversations, etc.
Noise reduction algorithms increase the ASR accuracy in noisy environments by processing the audio signal before it is passed to the speech recognizer. Many different noise reduction algorithms have been proposed. One algorithm used for speech enhancement is based on phase differences between the channels of a microphone array. However, this method needs to be tuned for a specific application and works well only for a limited range of noise levels and types.
Accordingly, when a speech recognizer is employed in a mobile device, such as a smartphone or laptop, speech enhancement must overcome a highly variable acoustic environment. At the same time, manual tuning of a noise reduction algorithm for each noise condition is not practical.