Many communication devices configured to obtain audio data of user utterances include both a loudspeaker and a microphone. The loudspeaker is used to play audio signals, such as speech from a remote source during a telephone call, audio content presented from local storage or streamed from a network etc. The microphone is used to capture audio signals from a local source, such as a user speaking voice commands or other utterances. An acoustic echo occurs when the remote signal emitted by the loudspeaker is captured by the microphone, after undergoing reflections in the local environment.
An acoustic echo canceller (“AEC”) may be used to remove acoustic echo from an audio signal captured by a microphone in order to facilitate improved communication. For example, the AEC may filter the microphone signal by determining an estimate of the acoustic echo (e.g., the remote audio signal emitted from the loudspeaker and reflected in the local environment). The AEC can then subtract the estimate from the microphone signal to produce an approximation of the true local signal (e.g., the user's utterance). The estimate can be obtained by applying a transformation to a reference signal that corresponds to the remote signal emitted from the loudspeaker. In addition, the transformation can be implemented using an adaptive algorithm. For example, adaptive transformation relies on a feedback loop, which continuously adjusts a set of coefficients that are used to calculate the estimated echo from the far-end signal. Different environments produce different acoustic echoes from the same loudspeaker signal, and any change in the local environment may change the way that echoes are produced. By using a feedback loop to continuously adjust the coefficients, an AEC to can adapt its echo estimates to the local environment in which it operates.
Many communication devices also include a noise reduction (“NR”) module. In addition to user utterances and acoustic echo, background noise is typically present in any environment. The NR module can use a noise reduction algorithm to reduce the level of background noise present in an audio signal. Typically, the NR module reduces but does not entirely eliminate the level of noise in the audio signal.
In addition, communication devices may also use a residual echo suppressor (“RES”). Various factors, including nonlinearity and noise, can cause an echo to not be completely eliminated by an acoustic echo canceller. A residual echo suppressor may be used to further reduce the level of echo that remains after processing by an acoustic echo canceller. For example, residual echo suppressors may use non-linear processing to further reduce the echo level. In addition to echo, however, processing by a residual echo suppressor often eliminates noise as well. For example, a residual echo suppressor can receive an audio signal that already has reduced levels of noise after processing from the NR module and further process the signal so that the level of noise is wiped out completely.
This processing by the residual echo suppressor can have the undesirable effect of creating silence in the audio output signal. For example, when a user is speaking an utterance, the residual echo suppressor further reduces residual echo, but a level of background noise remains present in the output signal. However, when a user stops speaking an utterance, the residual echo suppressor can eliminate any residual echo as well as the background noise that was present. The abrupt transition in an audio output signal that includes some level of background noise and one that does not (e.g., silence) can cause a listener to mistakenly believe that the communication link is dead. In addition, the frequent changes between listening to a signal that includes some level of background noise and one that does not can cause distraction to a listener.