1. Field of the Invention
The present invention relates to audio signal processing and, more specifically but not exclusively, to stereophonic acoustic echo cancellation (AEC).
2. Description of the Related Art
This section introduces aspects that may help facilitate a better understanding of the invention. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is prior art or what is not prior art.
In a two-way audio communication, undesirable acoustic echo can occur when sounds (i.e., acoustic signals) corresponding to (electronic) audio signals transmitted from a first side of the two-way communication and rendered by loudspeakers at the second side are picked up by microphones at the second side and included in the audio signals transmitted back to and rendered by loudspeakers at the first side as acoustic echo to individuals located at the first side. Acoustic echo cancellation (AEC) refers to signal processing that attempts to estimate the audio signals corresponding to acoustic echo occurring at the second side of the two-way audio communication and appropriately compensate the audio signals to be transmitted back to the first side to reduce or even eliminate the contribution of that acoustic echo in those transmitted audio signals.
Note that, as used in this disclosure, the term “loudspeaker” refers to any suitable transducer for converting electronic audio signals into acoustic signals (including headphones), while the term “microphone” refers to any suitable transducer for converting acoustic signals into electronic audio signals.
In a monophonic audio system, each side has only one microphone and only one loudspeaker. In a stereophonic audio system, each side has two microphones that generate left and right outgoing audio channels and two loudspeakers that render left and right incoming audio signals. In a multichannel audio system, each side has more than two microphones and more than two loudspeakers. Acoustic echo cancellation can be applied with varying sophistication and corresponding variable success in each of these different audio architectures.
Both the stereophonic AEC (SAEC) and multichannel AEC (MAEC) problems differ from the straightforward monophonic AEC application because of the non-uniqueness problem, i.e., the underlying equations to be solved by the echo canceller system can be singular or ill conditioned. The major effect this has is that, if no precaution is taken, the AEC has to re-converge as soon as there is any acoustic change in the transmission room (aka the far-end room or the first side of a two-way audio communication referred to previously). In the monophonic AEC case, it is not necessary to reconverge following transmission-room changes since the solution is independent of this variation. Still, as in the monophonic AEC case, SAEC and MAEC modules have to manage normal echo-path changes at the receiving room (aka the near-end room or the second side).
The seriousness of transmission-room tracking is that these changes can be very abrupt, e.g., one talker in the far-end room stops talking when another person in the same room starts talking. Considering all practical issues there are to control monophonic AECs, neglecting this additional fundamental problem can cause significant performance issues for any stereo or multichannel AEC implementation. The net effect of acoustic-path changes in both the transmission room and the receiving room can be seen in terms of the inter-channel cross-correlation of the receive channels. The acoustic paths in the transmission room also determine the inter-channel cross-correlation of the downlink (far-end) channels.
It is desirable to control and limit the inter-channel cross-correlation of transmission channels (downlink), without causing objectionable stereo image distortion or spectral artifacts.
In the literature, there are various proposals for achieving inter-channel decorrelation. For details, see Refs [1], [2], [3], [4], [5], [6], and [7]. In fact, the methods presented in the latter four references belong to the same category, i.e., they all introduce a time-varying phase shift in the stereo channels to achieve decorrelation. This approach is very effective, especially for higher frequencies (>1.5 kHz), while it has to be carefully applied at lower frequencies to avoid noticeable distortion. Frequencies less than 1 kHz are more difficult than higher frequencies to decorrelate without introducing audible distortion, because human hearing is more sensitive to phase shifts at the lower frequencies (Ref [8]).