In instances where a microphone and a speaker are implemented in the same environment, it is often desirable to reduce or eliminate undesirable echo generated by the microphone due to sound emanating from the speaker. Furthermore, in some implementations, it is desirable to record at the microphone even while the speaker is outputting unrelated content. For example, it may be desirable to, while the speaker is playing music, far-end speech or other content, for the microphone to attain input audio data that may be further processed by speech recognition engines or other applications. However, attaining useful input audio data in such contexts may be difficult due to the microphone picking up the sound (i.e., echo) from the speaker.
In typical acoustic echo cancellation use cases, the usable dynamic range of an audio input device (e.g., microphone) is limited due to echo sound level reaching the audio input device. For example, in implementing loud echo reduction on a 16-bit audio stream, total microphone gain may need to be reduced to avoid overload and clipping. Such reduction in microphone gain often renders the residual audio signal unusable for speech processing or similar applications due to the decrease of effective bit-depth. In such contexts, classic acoustic echo cancellation may post-process audio input data received at the microphone to remove the undesirable echo. Such techniques improve signal-to-noise ratio (SNR), however they are unable to recover speech information lost due to reduced microphone gain. These problems arise for cases where total input signal dynamic range is larger than the range of the microphone and/or processing pipeline bit-depth capacity. For example, quiet speech signals are lost under loud echo signal conditions or occupy less effective bits. Such problems may lead to worse SNR compared to no echo condition even with a theoretically ideal acoustic echo cancellation.
It may be advantageous to improve input audio data received at an audio input device of a device in contexts when an echo, particularly from a speaker of the same device, is present. It is with respect to these and other considerations that the present improvements have been needed. Such improvements may become critical as the desire to implement devices capable of playing content through a speaker while recording speech becomes more widespread.