The present invention concerns a method for cancelling multi-channel acoustic echo, as well as a multi-channel acoustic echo canceller.
In the domain of transmission of sound signals, in some applications such as xe2x80x9chands freexe2x80x9d telephony and teleconference, the acoustic echo is a source of considerable inconvenience. Known devices which counteract the acoustic echo usually comprise adaptive filters, the function of which is to identify and model the impulse response of the acoustic coupling path between the loudspeaker(s) and the microphone(s) of the considered sound signals transmission system.
FIG. 1 illustrates the general structure of a conventional acoustic echo canceller. It is associated with a loudspeaker 10 and a microphone 12 between which there exists an acoustic coupling path or echo path 14. References 16 and 18 designate respectively the received sound signal and the transmitted sound signal. The echo canceller of FIG. 1 includes an adaptive filter 20 receiving at its input the received sound signal 16. The object of the adaptive filter 20 is to estimate through its coefficients the impulse response of the echo path 14, in order to subtract the echo from the signal received by the microphone 12. To this end, the output of the adaptive filter 20 is connected to a subtractor 22 which subtracts the signal output by the adaptive filter 20 from the signal received by the microphone 12. The difference signal obtained at the output from the subtractor 22 supplies an estimation error 24. The coefficients of the adaptive filter 20 are adjusted over time by an appropriate algorithm which uses estimation error information.
The choice of this algorithm is the determining factor in the performances of the echo canceller. The known echo cancellation methods and devices currently use an algorithm called the normalised stochastic gradient, usually designated by the acronym NLMS. A disadvantage of this algorithm is that its convergence speed depends upon the spectral characteristics of the received sound signal. An object of the present invention is to reduce this dependence.
The problem of the acoustic echo of course also arises in multi-channel sound signals transmission systems, i.e. in systems with several loudspeakers and several microphones, for example two loudspeakers and two microphones in the case of stereophony.
Multi-channel echo cancellation methods and devices are known based upon the same principle as those of single channel cancellation. FIG. 2 shows by way of an example the partial block diagram of a conventional stereophonic acoustic echo canceller, where, for clarity, only one of the two microphone channels has been shown. The general structure can be easily generalised to that of an acoustic echo canceller with N sound signal channels, where N is any integer. Only the case for N=2 is described in the following. In a similar manner to the echo canceller of FIG. 1, the stereophonic echo canceller is applied to received sound signal channels 161, 162 and transmitted sound signal channels, only one of which, designated by the reference number 181, is shown. The echo canceller is associated with two loudspeakers 101, 102 and two microphones, only one of which, designated by the reference number 121, is shown. There are four echo channels, two of which echo channels (141, 142) are between the two loudspeakers 101, 102 and the first microphone 121 and two other echo channels (not shown) are between the two loudspeakers 101, 102 and the second microphone. In order to estimate the impulse responses from the various echo channels, an adaptive filter is provided between each loudspeaker channel and each microphone channel. Thus, if the microphone channel 121 is considered, it is provided with two adaptive filters 201, 202, which receive respectively as the input the received sound signals 161 and 162. The outputs of the adaptive filters 201, 202 are supplied as the input to an adder 261. In an echo canceller with N channels where N is any integer, such an adder is provided on each microphone channel. The sum of the output signals of the adaptive filters supplied by the adder 261 is subtracted, by a subtractor 221, from the signal received by the microphone 121. The same operation is carried out on each microphone channel. The difference signal obtained as the output of the subtractor 221 supplies a common estimation error 241 to all the adaptive filters relative to the considered microphone channel; in the example of FIG. 2, the estimation error 241 is common to the adaptive filters 201 and 202. As is the case for mono-channel echo cancellation, the coefficients of the adaptive filters are modified in an iterative manner by an appropriate algorithm, on each microphone channel, from the estimation error obtained.
It has been noted, in multi-channel cancellation, and in particular in stereophonic echo cancellation, that the convergence speed of the adaptation algorithms of the coefficients of the adaptive filters was comparatively smaller than the convergence speed of those algorithms in the case of the mono-channel echo cancellation. It has been shown that this slowing down of the convergence is the result of the fact that the sound signals received by the loudspeakers, designated by the reference numbers 161 and 162 in FIG. 2, are mutually correlated.
The slowing down of the convergence causes several disadvantages. In particular, in a teleconference system, the speakers located in the distant room detect over a longer time the echo of their speech during start up of the system or after an acoustic modification (movement of the listeners for example) in the room where the echo canceller is located. Moreover, in multi-channel echo cancellation, every acoustic modification in the distant room disturbs the convergence of the adaptive filters of the echo canceller, because of the mutual correlation, mentioned previously, between the speech signals received, which causes a reappearance or an increase of the echo level.
On the other hand, it has been observed in practice that when the non-mutually correlated components are present on each microphonic signal, they tend to accelerate the convergence of the multi-channel echo cancellers. An object of the present invention is to use this property of the mutually uncorrelated components to improve the performances of multi-channel echo cancellers, and to improve as a consequence the quality of the communication in the sound signal transmission systems which implement multi-channel echo cancellers. To do that, the general principle of the present invention consists of adding, to the received sound signal channels, mutually uncorrelated auxiliary signals, made inaudible by using some human auditory properties.
More exactly, the present invention proposes an echo cancellation method on N signal sound channels each having a loudspeaker and an associated microphone, N being an integer greater than or equal to 1, according to which, on each of the N channels:
(a) a synthetic signal is created having the spectral characteristics of a white noise, the spectrum of this signal extending over several adjacent frequency bands, and this synthetic signal being uncorrelated from the synthetic signals created on the other channels;
(b) for each frequency band, a frequency masking threshold is computed corresponding to the signal associated with the loudspeaker of the considered channel using properties of human auditory perception;
(c) in each frequency band, the level of the synthetic signal is brought to the value of the associated masking frequency threshold, so as to obtain an auxiliary signal;
(d) the auxiliary signal is added to the signal associated with the loudspeaker of the considered channel, the masking frequency thresholds having been previously computed so as to make the auxiliary signal inaudible, and the auxiliary signals of the N channels being mutually uncorrelated;
(e) the previously obtained signal is supplied as the input to an adaptive filter the coefficients of which form an estimation of the impulse response of the acoustic coupling path between the loudspeaker and the microphone associated with the sound signal channel considered;
(f) the signals obtained as the output of the N adaptive filters respectively associated with each microphone channel are added and the resultant signal is subtracted from the signal received by the microphone associated with this channel;
(g) an estimation error is calculated from the difference obtained at the outcome of the previous subtraction;
(h) the coefficients of the adaptive filters associated with the considered microphone are corrected as a function of the associated estimation error.
In a particular embodiment of the method, for computing each frequency masking threshold,
(b1) blocks are formed each containing a pre-specified number of sound signal samples, two successive blocks mutually overlapping over a pre-specified number of samples;
(b2) the samples of each block are weighted by an apodisation window;
(b3) the Fourier transform of each block is computed;
(b4) the frequency domain is divided into several adjacent critical bands having specific lower and upper frequencies;
then, for each block and in each critical band:
(b5) the energies of the different spectrum lines belonging to the critical band are added, so as to obtain the value, for this critical band, of an energy distribution function;
(b6) the convolution product of the energy distribution function and a basilar spreading function are computed, obtained from a look-up table, so as to obtain a spread spectrum Ei;
(b7) a tonality index xcex1i is computed from the energies of the spectrum lines belonging to the different critical bands;
(b8) a correction factor Oi is computed from the previously computed tonality index;
(b9) a frequency masking threshold Ti is computed from the spread spectrum Ei and from the correction factor Oi, the frequency masking threshold Ti being defined as follows:
10 log10Ti=10 log10eixe2x88x92Oixe2x88x92Ci
where Ci is an additional threshold correction parameter computed from several characteristics of the sound signals of the N channels.
In a particular embodiment of the method, the characteristics serving to compute the additional threshold correction parameter Ci include the respective level of the signals of the N channels and a set of values of the inter-correlation function of at least some pairs of these signals, the additional correction parameter Ci being a function increasing with the values of this inter-correlation function.
As a variant, the characteristics serving to compute the additional threshold correction parameter Ci can include the sum of the energies contained in the various critical bands of each channel and the ratio of the energies per critical band for at least some pairs of the different channels; the more the sum of the energies contained in the various critical bands is significant and the less the ratio of the energies per critical band of a pair of channels is different between the different critical bands, then the more the additional correction parameter Ci is significant.
The present invention proposes also an echo canceller with N sound signal channels each having a loudspeaker and an associated microphone, N being a integer greater than or equal to 1, including:
auxiliary signal computing means, comprising:
blocks formation means each containing a pre-specified number of sound signal samples, two successive blocks mutually overlapping on a pre-specified number of samples;
means for weighting the samples of each block by an apodisation window, located at the output of the blocks formation means;
means for computing a Fourier transform, located at the output of the weighting means;
means for computing the lower and upper frequency values of adjacent critical bands, or a look-up table containing these values;
means for computing an energy distribution function on the critical bands, the value, in a given critical band, of the energy distribution function being computed by adding the energy of the different spectrum lines belonging to this critical band;
a second look-up table, containing values representing the basilar spreading functions each associated with a given critical band;
means for computing a spread spectrum Ei, by computing, for a given critical band, the convolution product of the energy distribution function and the spreading function associated with this critical band;
means for computing a correction factor Oi associated with a given critical band;
means for computing a frequency masking threshold Ti associated with a given critical band, from a spread spectrum and from a correction factor Oi associated with this critical band, the frequency masking threshold Ti being defined as follows:
10 log10Ti=10 log10Eixe2x88x92Oixe2x88x92Ci
where Ci is an additional threshold correction parameter;
means for generating synthetic signals having the spectral characteristics of a white noise;
control means, connected to the N sound signal channels, for computing for each critical band the additional threshold correction parameter Ci from several sound characteristics of the N channels;
means for bringing the level of a synthetic signal to the value of the associated frequency masking threshold so as to obtain an auxiliary signal;
means for computing an inverse Fourier transform;
memory means for storing the result of the processing of the successive blocks;
means for combining the result of the processing of a block with the result of the processing of the previous block;
means for sequential reading connected to the memory means, in order to pass the rate of data from the block rate to the sample rate;
N first adder means respectively placed on the N sound signal channels, for adding on each channel the received sound signal and the associated auxiliary signal;
Nxc3x97N adaptive filtering means, including N adaptive filtering means associated with each of the N microphone channels, and each having coefficients which form an estimation of the impulse response of the acoustic coupling path between the loudspeaker and the microphone associated with one of the N sound signal channels;
second adder means, connected to the output of the N adaptive filtering means associated with each microphone channel, for adding the output signals from these N adaptive filtering means;
N subtractor means, each placed at the output of the second adder means and connected respectively to the microphones of the N sound signal channels, for subtracting on each channel the signal obtained at the output of the second adder means from the signal received by the microphone of this channel;
N means for computing on each channel an estimation error from the result supplied by the subtractor means connected to the microphone of this channel;
N means for correcting in an iterative manner the respective coefficients of the N adaptive filtering means associated with each of the N microphone channels, as a function of the estimation error associated with each microphone channel.
The implementations differ according to the applications, and in particular, the number of loudspeakers of the considered sound signals transmission system can be different from the number of microphones.
In a particular embodiment of the device, the characteristics serving to compute the additional threshold correction parameter Ci include the respective level of the signals of the N channels and a set of values of the inter-correlation function of at least some pairs of these signals, the additional correction parameter Ci being a function increasing with the values of this inter-correlation function.
As a variant, the characteristics serving to compute the additional threshold correction parameter Ci can include the sum of the energies contained in the various critical bands of each channel and the ratio of the energies per critical band of at least some pairs of the different channels; the more the sum of the energies contained in the various critical bands is significant and the less the ratio of the energies per critical band of a pair of channels is different between the different critical bands, then the more the additional correction parameter Ci is significant.
In a particular embodiment of the device, the echo canceller further comprises means for computing a tonality index xcex1i from the energies of the spectral lines belonging to the different critical bands, the means for computing the correction factor Oi computing the correction factor Oi from the tonality index xcex1i.
In a particular embodiment of the device, the means for computing the tonality index supply a constant tonality index, identical for all the critical bands and defined by:
xcex1i=xcex1=min(SFM/SFMmax,1)
where min(a, b) designates the smallest of the values a and b,
where SFMmax is a parameter of pre-specified value in dB associated with a pure sinusoidal signal and,
where SFM=10 log10G/A where log10 designates the logarithm to base 10, G designates the geometric mean of the energy over a pre-specified number of points of the Fourier transform and A designates its arithmetic mean over the same number of points.
In a particular embodiment, SFMmax=xe2x88x9260 dB.
In a particular embodiment, the means for computing the correction factor Oi supply a correction factor Oi defined by:
Oi=max(SO, xcex1i.(k1+Bi)=(1xe2x88x92xcex1i).k2)
where max(a, b) designates the largest of the values a and b,
where SO, k1 and k2 are pre-specified parameter values in dB,
where xcex1i is the tonality index associated with the considered critical band, and
where Bi designates the frequency of the critical band in Bark.
In a particular embodiment of the device, SO=24.5 dB, k1=14.5 dB and k2=5.5 dB.