The present invention relates to acoustic echo cancellers which process stereophonic input signals in the frequency domain.
Acoustic echo occurs whenever there is a strong coupling between a microphone and a loudspeaker. The microphone then picks up a delayed and attenuated version of the input signal broadcast in the acoustic space by the loudspeaker. Stereophonic echo, or more generally multi-channel acoustic echo, is referred to when the microphone simultaneously picks up echoes from several loudspeakers.
Acoustic echo cancellers generally model the acoustic path between each loudspeaker and the microphone by means of an adaptive filter whose coefficients are updated by stochastic gradient algorithms: NLMS (xe2x80x9cNormalised Least Mean Squaresxe2x80x9d), APA (xe2x80x9cAffine Projection Algorithmxe2x80x9d), FDAF (xe2x80x9cFrequency Domain Adaptive Filterxe2x80x9d), etc., or exact least squares algorithms: RLS (xe2x80x9cRecursive Least Squaresxe2x80x9d).
It is commonly acknowledged that the performance of adaptive filtering algorithms deteriorates if multi-channel acoustic echo cancellation systems are implemented in the presence of highly correlated input signals (see F. Amand et al. xe2x80x9cMulti-channel acoustic echo cancellationxe2x80x9d, Proc. 4th International Workshop on Acoustic Echo and Noise Control, Roros, June 1995, pages 57-60; M. Mohan Sondhi et al., xe2x80x9cStereophonic Acoustic Echo Cancellationxe2x80x94An Overview of the Fundamental Problemxe2x80x9d, IEEE Signal Processing Letters, Vol. 2, No. 8, August 1995, pages 148-151). Various solutions have been proposed in an attempt to overcome this problem:
using monophonic filters (A. Hirano et al., xe2x80x9cA Compact Multi-Channel Echo Canceller with a single Adaptive Filter per Channelxe2x80x9d, Proc. ICASSP 1992, pages 1922-1925; S. Minami, xe2x80x9cA Stereophonic Echo Canceller Using Single Adaptative Filterxe2x80x9d, Proc, ICASSP 1995, pages 3027-3030);
modifying time gradient algorithms (F. Amand et al., xe2x80x9cUn algorithme d""annulation d""xc3xa9cho stxc3xa9rxc3xa9o de type LMS prenant en compte l""inter-corrxc3xa9lation des entrxc3xa9esxe2x80x9d, Fifteenth GRETSI conference, Juan-les-Pins, September 1995, pages 407-410; J, Benesty et al, xe2x80x9cUn algorithme de projection à deux voies avec contraintesxe2x80x94Application à l""annulation d""xc3xa9cho acoustique stxc3xa9rxc3xa9ophoniquexe2x80x9d, Fifteenth GRETSI conference, Juan-les-Pins, September 1995, pages 387-390);
de-correlating signals before broadcasting them (J. Benesty et al., xe2x80x9cA Hybrid Mono/Stereo Acoustic Echo Cancellerxe2x80x9d, IEEE Workshop on application of signal processing and acoustics (WASPAA""97)).
The echo cancellers according to the invention find applications in multi-channel communication systems in particular in video-conferencing systems (see P. Heitkamper et al., xe2x80x9cStereophonic and multichannel Hands-Free Speakingxe2x80x9d, Proc. 4th International Workshop on Acoustic Echo and Noise Control, Roros, June 1995, pages 53-56; Y. Mahieux et al., xe2x80x9cAnnulation d""xc3xa9cho en txc3xa9lxc3xa9confxc3xa9rence stxc3xa9rxc3xa9ophoniquexe2x80x9d, 14th GRETSI, Juan-les-Pins, September 1993, pages 515-518), in hands-free telephones and in speech recognition systems (see M. Glanz et al., xe2x80x9cSpeech Recognition In Cars With Noise Suppression and Car Radio Compensationxe2x80x9d, 22nd ISATA, Florence, May 1990, pages 509-516; F. Berthault et al., xe2x80x9cStereophonic Acoustic Echo Cancellationxe2x80x94Application to speech recognition: Some experimental resultsxe2x80x9d, 5th International Workshop on Acoustic Echo and Noise Control, London, September 1997, pages 96-99).
Frequency domain stereophonic echo cancellers implement a method wherein first and second input signals (x1, x2) are applied to an echo generator system and an observation signal (z) is picked up at an output of said system, the input signals being digitally sampled and processed in successive blocks of 2N samples with frequency domain transformation according to a set of 2N frequencies. In accordance with this method, the processing of a block of 2N samples comprises the steps of:
transforming the first input signal from the time domain to the frequency domain to obtain a vector X1 having 2N complex components relating to the set of 2N frequencies, including spectral components of the first input signal relating to a sub-set of the set of 2N frequencies;
transforming the second input signal from the time domain to the frequency domain to obtain a vector X2 having 2N complex components relating to the set of 2N frequencies, including spectral components of the second input signal relating to said sub-set of frequencies;
multiplying term by term the vector X1 by a vector H1 of 2N complex coefficients to produce first estimated spectral echo components relating to the frequencies of the sub-set;
multiplying term by term the vector X2 by a vector H2 of 2N complex coefficients to produce second estimated spectral echo components relating to the frequencies of the sub-set;
adding the first and second estimated spectral echo components relating to each frequency of the sub-set to obtain a spectral component belonging to a vector of 2N estimated spectral total echo components;
transforming the vector of 2N estimated spectral total echo components from the frequency domain to the time domain to obtain an estimated total echo;
subtracting the estimated total echo from the observation signal to produce an error signal;
transforming the error signal from the time domain to the frequency domain to obtain a vector E of 2N spectral components of the error signal relating to the set of 2N frequencies; and
updating the vectors H1 and H2 for the processing of the next block, on the basis of the vectors X1, X2 and E.
In the known systems, said sub-set of frequencies represents the entire set of 2N frequencies.
Usually (stereophonic FDAF algorithm), the updating of the vectors H1 and H2 for the processing of the next block takes account of the energy gradient of the error signal, estimated by ∇i=X*i{circle around (X)}E for the vector Hi (i=1 or 2), where {circle around (X)} denotes the term-by-term product of two vectors and (*) denotes complex conjugation.
The gradient is generally normalised: ∇Ni=Bi{circle around (X)}∇i for i=1 or 2, where Bi is a vector of size 2N, whose term corresponding to a frequency f is the inverse of the spectral energy Pii(f) of the i-th input signal evaluated at the frequency f (in other words, Pii(f)= less than Xi(f).Xi(f)* greater than  is a current average of |Xi(f)|2=Xi(f).Xi(f)*, where Xi(f) is the component of the vector Xi relating to the frequency f).
In addition, a constraint is often placed on the normalised gradient in order to retain only the linear convolution terms in the frequency calculation of the gradients: ∇Ci=C.∇Ni for i=1 or 2, where C denotes a constant constraint matrix.
The echo estimation filters are finally adapted by Hi(k+1)=Hi(k)+xcexc.∇Ci(k) for i=1 or 2, the index k numbering the successive analysis blocks. The coefficient xcexc, lying between 0 and 1, is the adaptation step.
It is noted that each processing channel is subjected to a separate adaptation determined by the error signal and the input signal relating to this channel. This explains the identification errors which might be made by the algorithm in the presence of correlated input signals: two estimation errors for the vectors H1 and H2 can compensate for one another in the error signal while the algorithm is unable to correct them.
An object of the present invention is to propose another method of adapting stereophonic frequency filters which allows a certain degree of correlation between the signals to be taken into account.
Accordingly, the invention proposes a method as outlined above, further comprising computing a spectral energy P11(f) of the first input signal, a spectral energy P22(f) of the second input signal, an inter-spectral energy P12(f) of the first and second input signals and a coherence value xcex93(f) for each of the frequencies f in the set of 2N frequencies. According to that method, the updating of the vector H1 for the processing the next block is performed on the basis of a modified gradient in the form:
∇M1=A11{circle around (X)}∇1+A12{circle around (X)}∇2=(A11{circle around (X)}X*1+A12{circle around (X)}X*2){circle around (X)}Exe2x80x83xe2x80x83(1)
where A11 denotes a vector of size 2N whose component relating to a frequency f is   1            P      11        ⁢          xe2x80x83        ⁢                  (        f        )            ·              (                  1          -                      G            ⁢                          xe2x80x83                        ⁢                          (              f              )                                      )            
and A12 denotes a vector of size 2N whose component relating to a frequency f is             -      G        ⁢          xe2x80x83        ⁢          (      f      )                  P      12        ⁢          xe2x80x83        ⁢                  (        f        )            ·              (                  1          -                      G            ⁢                          xe2x80x83                        ⁢                          (              f              )                                      )            
G(f) being an increasing real function of the coherence value xcex93(f) such that 0xe2x89xa6G(f) less than 1. On the other hand, the updating of the vector H2 for the processing the next block is performed on the basis of a modified gradient in the form:
xe2x80x83∇M2=A22{circle around (X)}∇2+A*12{circle around (X)}∇1=(A22{circle around (X)}X*2+A*12{circle around (X)}X*1){circle around (X)}Exe2x80x83xe2x80x83(2)
where A22 denotes a vector of size 2N whose component relating to a frequency f is   1            P      22        ⁢          xe2x80x83        ⁢                  (        f        )            ·              (                  1          -                      G            ⁢                          xe2x80x83                        ⁢                          (              f              )                                      )            
Another aspect of the present invention relates to a stereophonic echo canceller arranged to implement a method as defined above.
Normalisation of the gradient takes account of crossed terms between the two channels, with a weighting determined by the coherence between the two input signals. This approach to adapting the filter coefficients represented by the components of H1 and H2 may be derived from a xe2x80x9cblock Newtonxe2x80x9d type of algorithm, expressed in the frequency domain, with limited hypotheses on the form of the signal correlation matrices.
In a preferred embodiment of the method, the spectral energies P11(f) and P22(f) are respectively averages of |X1(f)|2 and |X2(f)|2 and the inter-spectral energy P12(f) is an average of xcfx81.X1(f).X2(f)*, where X1(f) and X2(f) respectively represent the components of the vectors X1 and X2 relating to the frequency f and xcfx81 is a real coefficient such that 0 less than xcfx81xe2x89xa61. This coefficient xcfx81 allows crossed terms to be accounted for to a greater or lesser degree when adapting the filters. If xcfx81 approaches 0, one tends to the conventional stereophonic FDAF algorithm. If xcfx81=1, any correlation of the input signals is fully taken into account.
A priori, the function G(f) may be equal to the coherence value r(f). However, it is preferable to choose a non-linear function of this coherence value.
Said sub-set of frequencies for which echo cancellation is applied by the above method may represent the full set of 2N frequencies.
In an advantageous variant of the method, this sub-set of frequencies consists of frequencies for which the first and second input signals verify a correlation criterion.
Accordingly, a hybrid mono/stereophonic echo canceller can be provided, which will decide dynamically whether each frequency must be put through monophonic processing (relatively correlated signals) or stereophonic processing (not so highly correlated signals).
To this end, the components of the vectors X1 and X2 relating to the frequencies in the set of 2N frequencies which do not belong to said sub-set are zero, and vectors Xxe2x80x21 and Xxe2x80x22 of 2N complex components are formed, the components of said vectors Xxe2x80x21 and Xxe2x80x22 being zero for the frequencies of said sub-set and respectively equal to the spectral components of the first and second input signals for the frequencies of the set of 2N frequencies which do not belong to said sub-set. One of the vectors Xxe2x80x21 and Xxe2x80x22 is selected and multiplied term by term by a vector Hxe2x80x2 to obtain estimated spectral monophonic echo components for the frequencies in the set of 2N frequencies not belonging to said sub-set, and the vector of 2N estimated spectral total echo components is completed with said estimated spectral monophonic echo components. The vector Hxe2x80x2 is typically updated on the basis of a gradient term proportional to the product term by term of the vector E and the selected vector Xxe2x80x21 or Xxe2x80x22.
The selected vector Xxe2x80x21 or Xxe2x80x22 is preferably the one which is ahead of the other in time, so as to verify the causality condition.
The correlation criterion used may be evaluated by means of the coherence value computed for each of the frequencies in the set of 2N frequencies.
It should be noted that the latter embodiment may generally be applied to any stereophonic echo cancellation scheme in the frequency domain but also in the time domain (in full band or with a sub-band decomposition). After a frequency analysis of the correlations between the input signals, an echo cancellation of the stereophonic type can be applied to the frequencies for which the analysis reveals a weak correlation and an echo cancellation of the monophonic type to the frequencies for which the analysis reveals a strong correlation. Such hybrid echo canceller offers greater flexibility than hybrid echo cancellers which use a fixed frequency limit with a stereophonic scheme at the low frequencies and a monophonic scheme at the high frequencies (see J. Benesty et al., xe2x80x9cA Hybrid Mono/Stereo Acoustic Echo Cancellerxe2x80x9d, IEEE Workshop on application of signal processing and acoustics (WASPAA""97)).
In the case of a full band correlation analysis, the proposed hybrid technique allows an appropriate echo cancellation scheme (mono or stereo) to be adopted on a dynamic basis. This will therefore get round the problems encountered with stereophonic systems if the input signals are correlated (a monophonic scheme will then be applied).
According to another aspect, the invention proposes a stereophonic echo canceller wherein first and second input signals are applied to an echo generator system and an observation signal is obtained at an output of said system, comprising means for processing said signals in digitally sampled form including:
means for stereophonically filtering two signals respectively obtained from the first and second input signals;
means for monophonically filtering a signal obtained from the first and second input signals;
means for obtaining an estimated echo of the input signals from the outputs of the stereophonic and monophonic filtering means;
means for subtracting the estimated echo from the observation signal and producing an error signal;
adaptation means for updating the stereophonic and monophonic filtering means on the basis of the input signals and the error signal; and
means for analysing correlations between the first and second input signals in order to identify first portions of the two input signals in which they are relatively de-correlated and second portions of the two input signals in which they are more correlated than in the first portions in order to apply two signals constructed respectively from the first portions of the two input signals to the stereophonic filtering means and in order to apply a signal constructed from the second portions of one of the two input signals to the monophonic filtering means.
Said first and second portions of the input signals are understood to be time portions and/or frequency portions. The stereophonic and monophonic filtering means preferably operate in the frequency domain but may also operate in the time domain.