The present invention relates to the control of echo cancellation filters.
In numerous communication systems and equipment, a problem arises with echo, that is to say situations in which an acoustic signal is emitted and simultaneously, totally or partially acquired, and then played back to the initial emitter in the form of an echo.
This type of situation occurs during communications implementing equipment comprising a loudspeaker for emitting an acoustic signal and a microphone situated within range of the loudspeaker such as compact or “hands-free” type equipment. On account of the proximity between the loudspeaker and the microphone, the microphone is liable to acquire the signal emitted by the loudspeaker. Thus, a distant talker hears his own voice delayed by the lag introduced by the communication chain.
In order to alleviate this problem, the terminals use echo cancellation filters. In a general manner, an echo is estimated on the basis of the signal emitted and is deducted from the microphone signal.
In practice, this is often carried out by adaptive filters applied to the microphone signal. In a general manner, adaptive filtering consists in giving an expression for the evolution of coefficients of the filter over time, this expression having to meet a convergence criterion. Several algorithms are used in echo cancellation such as, for example, the so-called LMS (Least Mean Square) or NLMS (Normalized LMS) algorithms or other algorithms that are well known to a person skilled in the art and described in particular in the document by Simon S. Haykin, “Adaptive Filter Theory”, Prentice Hall (September 2001).
In order to suitably filter the echo and not to introduce distortion in the signal played back, it is necessary to control the echo cancellation filters differently when there is an echo and when there is not. More precisely, it is necessary to permit modifications of the parameters of the filters solely in an echo-only period and it is necessary to avoid modifying the parameters of the filters in situations where there is no echo as well as in the so-called double-talk situations, that is to say cases where the microphone signal comprises an echo component and a useful signal component.
The discrimination of these situations is a complex problem. Indeed, it is relatively simple to detect periods of echo absence on account of the absence of signal on the loudspeaker but, it is very difficult to distinguish echo-only situations from double-talk situations. Now, the performance of the adaptive echo cancellation algorithms depends very strongly on the capacity to distinguish these phases.
Existing solutions are based on comparing between properties of the signal emitted and the same properties evaluated on the microphone signal.
An example of a conventional system is represented with reference to FIG. 1 in which a terminal 2 is schematically represented. Acoustic signals are conveyed to this terminal in a conventional manner, for example, by Hertzian wave or with the aid of any appropriate communication network.
The terminal receives a signal x(n) from the network such as a speech signal. This signal x(n) is broadcast on a loudspeaker 6. The signal emitted by the loudspeaker 6 is transformed by the acoustic channel H corresponding to the environment of the terminal 2.
In the terminal 2, a microphone 8 records the local signal y(n), composed of a useful signal pu(n) corresponding for example to the speech signal emitted by a talker, added with a part of the sound emitted by the loudspeaker: the acoustic echo. This echo is the result of the convolution of the signal broadcast by the loudspeaker 6 with the acoustic channel H and depends on the dimensions of the terminal, the materials used, the environment and other parameters.
The signal y(n) acquired by the microphone 8 is then returned to an adaptive echo cancellation filter 10. This filter 10 is used to generate an estimated echo {circumflex over (z)}(n) which is deducted from the microphone signal in a mixer 12.
In the example described, the terminal 2 comprises a conventional feedback loop from the mixer 12 so that the coefficients of the filter 10 are modified in such as way as to decrease the difference between the echo and the microphone signal.
The adaptive filter 10 is denoted ĤL and is a filter of length L, whose coefficients {ĥi(n)}i=0, . . . , L-1 are adapted over time and indexed by the temporal index n. This filter generates the pseudo-echo {circumflex over (z)}(n). The residual echo e(n) results from subtracting {circumflex over (z)}(n) from the microphone signal y(n) in the mixer 12. We then have the following expressions:
                    z        ^            ⁡              (        n        )              =                  ∑                  i          =          0                          L          -          1                    ⁢                                                  h              ^                        i                    ⁡                      (            n            )                          ×                  (                      n            -            i                    )                                e      ⁡              (        n        )              =                  y        ⁡                  (          n          )                    -                        z          ^                ⁡                  (          n          )                    
In the example, a so-called LMS algorithm is used with as criterion the minimization of the power of the residual echo according to the following equation:ĤL(n)=ĤL(n−1)+μ·e(n)·X(n)
In this equation ĤL(n)=[ĥ0(n), ĥ1(n), . . . , ĥL-1(n)]T is the vector of the L coefficients of the adaptive filter of the instant n, and X(n)=[x(n), x(n−1), . . . , x(n−L+1)]T is the vector of the last L samples of the signal emitted to the loudspeaker 6. The term μ is a factor called the “adaptation step size” which controls the speed of convergence.
The role of μ is important in controlling the stability of the filter. In the echo-only situations, the filter may be adapted in such a way as to converge speedily. In the absence of an echo, the adaptation of the coefficients is not desirable since this may lead to mismatch of the adaptive filter, and finally to perceptible rises in echo. Likewise, as soon as the local talker is active, whether it be in a speech-only or double-talk situation, it is appropriate to freeze the adaptation of the echo cancellation filter 10.
In the converse case, the filter 10 seeks to suppress the useful speech and becomes maladapted. In addition to the risks of filter divergence, this leads to strong degradations of the useful signal and to the reappearance of echo, or even to its amplification.
The terminal 2 also comprises a module 14 for controlling the filter 10, also called the double-talk detection module or DTD. This module 14 analyses the signals x(n) and y(n) so as to extract a decision which makes it possible to freeze the adaptation of the filter 10, in particular in a period of double-talk.
The system described with reference to FIG. 1 uses a direct comparison of the signals emitted and received. This does not however allow optimal control on account of the modifications induced by the acoustic channel H.
In order to improve the detection of double-talk situations, certain methods of controlling adaptive echo cancellation filters analyse the properties of the channel. Such is the case in particular for the document P. Ahgren, “On system identification and acoustic echo cancellation”, Thesis UPPSALA Universitet (April 2004) which uses two filters ĤL1 and ĤL2. A diagram of such a system is represented in FIG. 2.
In this figure, the elements similar to those described with reference to FIG. 1 bear the same reference numerals. Depicted are the terminal 2 with the adaptive filter 10 and the mixer 12 as well as the loudspeaker 6 and the microphone 8, separated by the acoustic channel H.
In this embodiment, the double-talk detection module 14 is also depicted. The terminal 2 comprises however a second adaptive filter 16. The filter 10 is situated upstream of the double-talk detection module 14 whereas the filter 16 is situated downstream of the module 14, with respect to the direction of processing of the microphone signal.
The filter 10 is continuously adapted by virtue of the use of a negative feedback loop implemented in a conventional manner to reduce the residual calculated by the mixer 12 between the pseudo-echo and the microphone signal.
The filter 16 is also adapted according to a feedback loop, this adaptation being driven by the decision of the double-talk detection module 14. If the module 14 detects the presence of local speech, it may be decided, for example, to freeze the filter 16 or any other soft decision making it possible to slow down the adaptation according to the degree of probability of the presence of local speech. It is the filter 16 which serves to estimate the echo {circumflex over (z)}2 (n) which is then subtracted from the microphone signal by a mixer 18.
In an echo-only period, when the acoustic channel H does not vary abruptly, the evolution of the coefficients of the filter 10 slows down in tandem with the convergence of the coefficients. As soon as double-talk is present, the coefficients of the filter 10, which is continuously adapted, are greatly modified by the presence of useful speech.
When these coefficients vary quickly and strongly, the probability of being in a double-talk situation is considerable.
For reasons of simplicity of implementation, the variance is calculated only for the largest value of the coefficients of the adaptive filter ĤL1:
  γ  =            max              h        i        1              ⁢                  (                              h            i            1                    ,                      …            ⁢                                                  ⁢                          h              L              1                                      )            /      where hi1 signifies that these are the coefficients of the continuously adapted filter 10. This document proposes to compare the variance of γ with a fixed threshold. Thus, in the presence of echo, a strong variance signals the presence of a useful speech signal, and consequently, a potential double-talk period. Therefore, the coefficients hi1 of the filter 10 are no longer copied over to the second filter 16 whose evolution is frozen.
Such a system does not however make it possible to differentiate between a variation of the acoustic channel and the appearance of a double-talk situation. These phenomena have the same impact on the evolution of the coefficients of the adaptive filter 16 that is used to calculate the pseudo-echo which is subtracted from the microphone signal.
Thus, the existing methods and systems are not entirely satisfactory as regards the control of echo cancellation filters in particular, owing to the imperfect detection of double-talk situations.
An object of the invention is to improve this situation by virtue of a method and a device for controlling echo cancellation filters.