In a communication system a communication network is provided, which can link together two communication terminals so that the terminals can send information to each other in a call or other communication event. Information may include audio, text, image or video data.
Modern communication systems are based on the transmission of digital signals. Analogue information such as speech is input into an analogue to digital converter at the transmitter of one terminal, hereinafter referred to as the near end terminal, and converted into a digital signal. The digital signal is then encoded and transmitted in data packets over a channel to the receiver of a destination terminal, hereinafter referred to as the far end terminal.
To transmit audio signals, such as speech, analogue audio data is input from a microphone at the near end terminal. The analogue audio data is then converted into digital data before it is transmitted to the far end terminal via the communication network.
A reply signal which is transmitted from the far end terminal, herein after referred to as the far end signal, is received at the near end terminal and output from a loudspeaker of the near end terminal.
A phenomenon commonly referred to as acoustic echo occurs when the far end signal output from the loudspeaker 20, as shown in FIG. 1, traverses an echo path 22 and is recorded by the microphone 10 of the near end terminal as an acoustic echo component in the near end signal. The echo component in the near end signal may in some cases cause the far end speaker to hear their own voice transmitted back from the near end terminal.
The echo path describes the effects of the acoustic paths travelled by the far end signal from the loudspeaker to the microphone. The far end signal may travel directly from the loudspeaker to the microphone, or it may be reflected from various surfaces in the environment of the near end terminal. The echo path may also describe any other effects that the far end signal has on the near end recording. For example the far end signal may cause mechanical vibration in the near end terminal, or cause electrical induction in the components of the near end terminal.
The echo path traversed by the far end signal output from the loudspeaker may be regarded as a system having a frequency and a phase response which may vary over time. By considering the echo component as the output of the system and the far end signal as the input of the system the frequency response of the echo path is a measure of the gain between the magnitudes of the output and the input of the system as a function of frequency.
In order to remove the acoustic echo from the signal recorded at the near end microphone it is necessary to estimate how the echo path changes the desired far-end loudspeaker output signal to an undesired echo component in the input signal. The effects of the echo path are estimated by calculating a mathematical representation of the relation between the signal output from the loudspeaker and the undesired echo input signal. The mathematical representation of the combined effects of the frequency and phase response which describes the echo path is hereinafter referred to as the echo path transfer function. When the echo path transfer function is accurately determined, the frequency response of the echo path transfer function will be equivalent to the frequency response of the actual echo path.
The echo path transfer function H(s) is the linear mapping of the Laplace transform X(s) of the far end signal to the Laplace transform Y(s) of the echo signal:Y(s)=H(s)X(s)  Equation (1)or
                              H          ⁡                      (            s            )                          =                                            Y              ⁡                              (                s                )                                                    X              ⁡                              (                s                )                                              =                                    ℒ              ⁢                              {                                  y                  ⁡                                      (                    t                    )                                                  }                                                    ℒ              ⁢                              {                                  x                  ⁡                                      (                    t                    )                                                  }                                                                        Equation        ⁢                                  ⁢                  (          2          )                    
The echo path transfer function H(s) is calculated by comparing the far end loudspeaker signal with the near end signal recorded by the microphone. When the near-end speaker is silent and the far-end speaker is active, only the echo provided by the far end signal is recorded by the near end microphone. In this case, the echo path transfer function can be adaptively calculated to model the way that the far-end signal changes when traversing the echo path.
In known acoustic echo cancellation (AEC) techniques the adaptively calculated echo transfer function is used to provide filter coefficients that filter the far end signal to generate an estimate of the echo component in the near end signal in accordance with the echo path transfer function. The estimated echo may then be subtracted from the near end signal. Other AEC techniques employ attenuation based filtering methods that attenuate the near end signal according to the calculated echo path transfer function to remove the echo component from the near end signal.
FIG. 2 is a diagram of a known echo canceller according to the prior art. The echo canceller comprises far end and near end Short Time Fourier Transform (STFT) blocks 8 and 9 arranged to transform the far end and near end signals into far end and near end frequency domain signals respectively. Far end and near end energy blocks 10 and 11 are arranged to convert the far end and near end frequency domain signals into far end and near end power spectrums respectively.
A ratio block 1 is arranged to calculate the echo path transfer function by comparing the far end power spectrum and the near end power spectrum when the near end signal only contains the echo component. As shown the calculated echo path transfer function gains are multiplied with the far end power spectrum using a mixer 2, to generate a modified far end power spectrum that represents a power spectrum of the echo component in the near end signal.
By comparing the power spectrum of the echo component and the power spectrum of the near end component, attenuation gains are computed in block 3. The attenuation gains are then applied to the near end signal in block 4 to attenuate the echo component from the near end power spectrum.
In order to calculate the echo path transfer function when the near end signal only contains the echo component, a voice activity detector 5 is arranged to compare the voice activity on the far end and near end signals and to control the update rate of a far end smoothing filter 6 and a near end smoothing filter 7 accordingly. When speech is detected in the far end signal and only echo is detected in the near end signal the update rate of the smoothing filters 6 and 7 is controlled to be high. In all other cases the update rate is controlled to be low. As a result the outputs of the smoothing filters 6 and 7 are determined by the input signals that exist when the near end signal contains only echo.
The voice activity detector 5 is arranged to compare voice activity on the far end and near end signals by comparing calculated Signal to Noise Ratios (SNR) of the modified far end power spectrum and of the near end power spectrum. When the SNR value of the modified far end signal is high and the SNR value of the near end signal is not higher than the SNR value of the modified far end signal it may be determined by the voice activity detector that only echo is present on the near end signal.
The inventors of the present invention have identified that current acoustic echo cancellation methods, such as that described with reference to FIG. 2, do not reliably differentiate between the case where only echo is present on the near end signal and the case where the near end signal comprises both echo and a signal from the near end speaker.
For an AEC to effectively remove echo signals without causing distortions to the near end voice signal transmitted from the terminal, it is important that the smoothing filters are only updated when the near end signal contains only echo. A common problem with AECs is the updating of filters when both the near end speaker 23 and the far end speaker are active, hereinafter referred to as double talk. Updating the smoothing filters during double talk leads to the deterioration of the filter outputs, resulting in poor echo cancellation. Significant efforts have been made in the field to develop reliable double-talk detectors to solve this problem, with limited success. The difficulty is that in both the case where only echo is present on the near end signal and the case where double talk occurs, the far end and near end signals both contain active speech.
It is an aim of the present invention to provide an echo canceller with an improved method for detecting when the near end signal contains only echo and thereby provides more effective echo cancellation than currently known echo cancellation methods.