Embodiments according to the present invention relate to an echo suppression unit and a method for suppressing an acoustic echo, which may be used, for instance, in hands-free telecommunication systems or other acoustic systems that include multichannel loudspeaker playback based on a parametric representation of spatial sound.
Acoustic echoes arise from an acoustic coupling or feed-back between loudspeakers and microphones of telecommunication devices. This phenomenon is especially present in hands-free operations. The acoustic feedback signal from the loudspeaker is transmitted back to the far-end subscriber, who notices a delayed version of his own speech. Echo signals represent a very distracting disturbance and can even inhibit interactive, full-duplex communication. Additionally, acoustic echoes can result in howling effects and instability of the acoustic feedback loop. In a full-duplex hands-free telecommunication system, echo control is therefore advisable in order to cancel the coupling between loudspeakers and microphones.
FIG. 9 illustrates the general acoustic echo control problem. The far-end signal, emitted by a loudspeaker, travels to the microphone directly, and through reflected paths. Thus, the microphone does not only capture the local near-end speech but also the echo which is thus fed back to the user on the far-end.
A loudspeaker signal x(n) is provided to a loudspeaker 100, which transforms the loudspeaker signal into an audible oscillation of the medium surrounding the loudspeaker 100. As indicated in FIG. 9, microphone 110 may receive the emitted sound by the loudspeaker 100, which is indicated in FIG. 9 by a curved vector, wherein y(n) denotes a feedback signal from the loudspeaker 100 to the microphone 110.
Apart from the feedback signal y(n), the microphone 110 also records an additional sound signal w(n), which may for instance represent speech by a user. Both acoustic signals are recorded by the microphone 110 and provided, as a microphone signal z(n), to an echo removal unit 120. The echo removal unit 120 also receives the loudspeaker signal x(n). It outputs a signal in which—ideally—the contribution from the loudspeaker signal x(n) is removed from the recorded signal or the microphone signal z(n).
Hence, FIG. 9 illustrates the general setup of the acoustic echo control problem. The loudspeaker signal x(n) is fed back to the microphone signal z(n). An echo removal process removes this echo while—ideally—letting through the desired local near-end signal w(n).
Acoustic echo control represents a well-known problem and various methods to remove the acoustic echoes have been proposed [13]. Below, we briefly recall the approaches to acoustic echo suppression (AES) as, e.g., presented in [8, 9], as they are most suitable in the considered context of spatial audio communication.
When transmitting or playing back audio signals, multichannel systems are often used. In these systems multiple loudspeakers are used to play back sound and/or multiple microphones are used to record spatial sound. Such multichannel systems are, for instance, used in spatial audio teleconferencing systems that do not only transmit audio signals of the different parties, but also preserve spatial information of the recording scenario [12]. In other systems, the spatial information can be provided artificially or changed interactively [5].
In case that spatial audio is applied in telecommunication scenarios, an efficient representation of the multichannel audio signals should be used, while still assuring high audio quality. Parametric spatial audio coding represents a suitable approach to address this challenge. Below, we present practical methods that follow the parametric spatial audio coding paradigm and are especially important in the context of communication.
While multichannel systems as, for instance, the previously mentioned spatial audio coding provide the opportunity of transmitting a plurality of audio signals in a very efficient and bandwidth-saving manner, a straightforward implementation of an echo removal or echo suppression process into such multichannel systems necessitates an application to each and every microphone signal based on each and every loudspeaker signal as output by the multichannel system. This, however, may represent a significant, approximately exponentially growing computational complexity simply due to the high number of microphone and/or loudspeaker signals to be processed. Accordingly, this may necessitate additional costs due to a higher energy consumption, the necessity for a higher data processibility and, eventually, also slightly increased delay.