Consider a scenario with two microphones capturing audio at client “A” and transmitting to client “B” in stereo. User “B”, located at client B, now plays out the stereo signal through either stereo loudspeakers or a stereo headset. This is sometimes referred to as a “complete stereo” or “true stereo” transmission from client A to client B.
Continuing with the above scenario, assume that Acoustic Echo Cancellation (AEC) is turned on at client A. Applied on each microphone, the AEC consists of a linear filter part followed by Non-Linear Post-processing (NLP) to suppress the last residual echo. Echo cancellation on the left and right microphone signals at client A will never perform equally, since the data on each microphone are not identical. Small or larger differences in delays, microphone quality, location relative the loudspeakers and the speaker (e.g., the talker or participant), among others, will all have an impact on performance. How well the NLP will perform depends heavily on the quality of the linear filter part. Additionally, due to the differences described above, the amount of suppression that occurs on each signal will vary as well.
In one approach to NLP, user B will experience different levels of quality in the left and right channels. In a scenario where a headset is being used, this difference in quality is quite audible and fluctuations between left and right channels can be perceived (e.g., heard) by the user, which is quite annoying. Therefore, instead of enhancing the audio experience, current approaches to NLP actually result in degradation of audio quality.