The invention relates to the field of acoustics in air and it applies in particular to systems for wiring halls and rooms for sound.
The present invention relates to an echo reduction method implemented with a multi-sensor sound pickup device constituting an antenna and a sound playback device. The invention also relates to echo reduction apparatus designed to operate with a sound playback device to implement the method of the invention. The frequency domain to which the invention applies is the speech domain and more particularly that of low frequency speech signals.
A most advantageous application of the invention lies in reducing the echo in a sound pickup device in a teleconferencing system, thereby making so-called xe2x80x9chands-freexe2x80x9d communication possible without needing to use an offset microphone.
Acoustic echo is a major obstacle to proper operation of hand-free communications terminals. This acoustic echo is the result of the sensors of the sound pickup device capturing a portion of the signal emitted by the transducers of the sound playback device. The sensors and the transducers used for acoustics in air are respectively microphones and loudspeakers.
In a hands-free communications terminal, microphones are acoustically coupled, and possibly also mechanically coupled, with the sound playback system, either directly when the microphones and the playback system are in the same housing, or else indirectly as is the case for a video conference terminal placed on a TV monitor. The resulting echo comprises two portions:
the room response; and
direct coupling between the loudspeaker and the microphone.
The response of the room can generally be processed effectively by conventional echo controlling techniques such as echo cancelling or gain variation, and in particular because of the relatively low level of this portion of the echo. The same does not apply to the direct coupling which results both from the sound path through the air and also from vibration propagating through the shell of the communications terminal, and possibly also from any resonances of cavities or mechanical elements when set top boxes are used. This portion of the echo is generally at a level which is 6 dB to 20 dB higher than the level of the local speech picked up by the microphones, and it gives rise to a xe2x80x9chowl-aroundxe2x80x9d effect unless processing is applied.
If known echo cancelling techniques do not enable this unwanted effect to be processed in satisfactory manner, then a large variation in gain must be applied in order to prevent howl-around starting. This variation in gain must apply firstly during the stage in which the echo canceller is converging and secondly following any variation in the echo path or during moments of double talking. Because of this constraint, the communications terminal is not very interactive and the user can have the feeling that the hands-free function does not work well.
To mitigate that drawback, there exists several techniques for reducing direct coupling.
A first technique sets out to reduce mechanical coupling. The solution normally applied consists in using damping materials such as foams or rubber to isolate the microphone from the shell. Those materials have the effects of eliminating or greatly reducing loudspeaker-generated vibration in the shell, and also propagation thereof to the microphone.
The mechanical decoupling solution is effective in reducing coupling by vibration, but industrially speaking it is expensive. Furthermore, it does not reduce the acoustic coupling that can be large if the microphone is placed close to the loudspeaker, as is the present trend because of the desire for communications terminals to be small in size. Furthermore, when the playback system forms an integral portion of an assembly that also includes the sound pickup device, as is the case of a television monitor including loudspeakers and microphones or of a monitor having placed thereon a device that includes microphones, it is not possible to envisage isolating the loudspeakers from the shell of the assembly.
A second technique consists in using a digital compensation filter whose response is obtained by calculating the inverse of the impulse response of the mechanical and acoustic coupling between the microphone and the loudspeaker.
In theory, the signal from the loudspeaker due to the echo is cancelled from the output of the compensation filter. In practice, that technique does not give satisfaction the characteristics of the coupling change, even if only to a very small extent, for example due to the communications terminal being assembled and/or disassembled or due to a change in the characteristics of the microphones. Furthermore, that technique is unsuitable if the disturbances are of a non linear nature, i.e. if the disturbances cannot be modelled by a convolution product between the signal coming from the loudspeaker and a filter impulse response. Furthermore, the behavior of transducers is rarely linear, since they are generally subject to distortion and/or to saturation, both of which are typical examples of non-linear operation. Finally, in mass production, coupling is necessarily different from one communications terminal to another, for example because the loudspeakers and microphones that are used differ slightly from one another. Finally, those limitations make the filter compensation technique relatively ineffective.
Other systems exist for weighting microphones based on uniformly distributing microphones around a loudspeaker and amplitude-weighting the microphones and/or phase-shifting the microphones. Such systems constitute the subject matter of French patent No. 93/020504. Those systems are for use in group communications terminals possessing unusual geometrical symmetry due to the way they pick up sound omnidirectionally. The effectiveness of those systems is highly sensitive to mismatches between the microphones or the acquisition channels. Consequently, those systems require components to be sorted and microphones to be calibrated accurately, and such matching can become lost over time. In addition, adding a display screen, a keypad, or a swivel-mounted camera can break the symmetry between the loudspeakers and the microphones. Furthermore, when the number N of microphones exceeds two, such devices are not optimal since N degrees of freedom are available for satisfying two constraints, namely fixing gain in a given direction and cancelling direct coupling. Finally, since the signal as picked up corresponds to all of the contributions from each of the microphones and not only the contribution from the wanted source, and since these contributions are mutually phase-shifted and possibly of different amplitudes, the spectrum of the resulting signal is degraded at high frequencies.
Another system intended more for individual applications, as disclosed in French patent No. 98/14321 for example, offers the advantage of taking account of the specific features of the final terminal, i.e. of the communications terminal after the system has been integrated therein. That system makes use of two microphones positioned at different distances from the loudspeaker. The contributions from the two microphones are combined by weighting so as to cancel the direct coupling wave. The different weightings of the two signals coming from the two microphones make it possible to pick up useful sources situated in a far field. The path filter applied to the second microphone can be calculated by inverting the response of a measurement of the coupling between the microphone and the loudspeaker. Such calculation is generally sufficient in most situations since the microphones are in the direct field of the loudspeaker. Nevertheless, that solution requires there to be an amplitude difference of sufficient magnitude in the coupling waves between the two microphones. To obtain such a difference, it is necessary to position the microphones sufficiently close to the loudspeaker which gives rise to conditions that are very unfavorable in terms of coupling. Although that condition is not troublesome for an individual system in which said disposition is unavoidable, the same does not apply to any systems such as video conference terminals of the set top box type, i.e. terminals of a shape suitable for placing on a TV monitor. In that configuration, the further the microphones are away from the loudspeaker, the greater the spacing necessary between the sensors. That puts a limit on passband and increases sensitivity to obstacles situated nearby. Finally, there can be amplification of incoherent noise which can be controlled only by changing the distance between the two microphones.
Finally, the directivity of those devices is controllable to some extent only and is little different from the directivity specific to the microphones.
In parallel with those solutions, there exist multi-sensor techniques which seek to guard against acoustic coupling by making the system directional overall. When the sound source is situated in a noisy or reverberating environment, the directivity of a single sensor can be insufficient for extracting signal from noise.
To remedy that problem, one method consists in associating N sensors to form an acoustic antenna. The association consists in adding the signals output from the various sensors in phase coherence for a given direction (xcfx86, xcex8). Such addition constitutes one of the techniques known as xe2x80x9cchannel-formingxe2x80x9d. A sensor or a transducer is characterized by its directivity in three dimensions which is usually measured in two orthogonal planes and is represented in polar coordinates (r, xcfx86, xcex8) in the form of two radiation patterns. The directivity of a sensor gives an xe2x80x9cimagexe2x80x9d of the level at which a signal will be picked up by the sensor from a point sound source situated in a direction (xcfx86, xcex8) and at a distance r from the center of the sensor. The directivity of the antenna obtained after channel-forming presents performance that is better than the directivity of a single sensor. The N microphones which are separated from one another by a distance d pick up the pressure of a localized volume and thus perform spatial sampling of the sound field. This method provides good effectiveness at high frequencies, but at low frequencies (f less than 1000 Hz), the difficulty remains, for several reasons:
lack of directivity due to the small size of the antenna;
less robustness because of the high sensitivity at low frequencies to disparities between sensors, because of the effect of the TV screen, if any; and
the antenna has a directional radiation pattern for a source situated at the location of the loudspeakers that is different from its directional radiation pattern at 3 meters (m) since the loudspeakers are not located in the far field.
Thus, an object of the invention is to reduce the echo produced by acoustic coupling between the sound pickup device and the sound playback device without the drawbacks of known methods and apparatuses.
To this end, the invention provides an echo reduction method implemented with a multi-sensor sound pickup device forming an antenna and a sound playback device, the method consists in submitting the output signals from the sensors to complex weights w(f), said weights w(f) being calculated by maximizing the directivity factor Fd(f) under low frequency and near field constraints, where the expression for the directivity factor is as follows:                                           F            d                    ⁡                      (            f            )                          =                  1                                    1                              4                ⁢                π                                      ⁢                                          w                H                            ⁡                              (                f                )                                      ⁢                                          D                r                            ⁡                              (                f                )                                      ⁢                          w              ⁡                              (                f                )                                                                        (        1        )                                with        ⁢                  xe2x80x83                ⁢                  
                ⁢                                            D              r                        ⁡                          (              f              )                                =                                    ∫                              θ                =                0                            π                        ⁢                                          ∫                                  ϕ                  =                  0                                                  2                  ⁢                  π                                            ⁢                                                W                  ⁡                                      (                                          f                      ,                      ϕ                      ,                      θ                                        )                                                  ⁢                                  H                  ⁡                                      (                                          f                      ,                      r                      ,                      ϕ                      ,                      θ                                        )                                                  ⁢                                                      H                    H                                    ⁡                                      (                                          f                      ,                      r                      ,                      ϕ                      ,                      θ                                        )                                                  ⁢                sin                ⁢                                  xe2x80x83                                ⁢                θ                ⁢                                  ⅆ                  θ                                ⁢                                  ⅆ                  ϕ                                                                                        (        2        )            
the calculation being such that said weights w(f) satisfy a linear first constraint (3) on the modulus and the phase of the transfer function of the sound pickup device in given directions, the formulation of this first constraint at each frequency f being as follows:
CH(f)w(f)=s(f)xe2x80x83xe2x80x83(3) 
said method being characterized in that in addition said weights w(f) are calculated in such a manner as to satisfy a second constraint (4) determined on the basis of in-situ measurements of complex transfer functions of sound channels defined by the inputs of the loudspeakers of the sound playback device and the outputs of the sensors making up the sound pickup device, this second constraint at each frequency f being formulated as follows:
MH(f)w(f)=0xe2x80x83xe2x80x83(4) 
where formulas (1) and (4) are such that:
H is the propagation vector whose elements are the complex values of the free field transfer functions between a point source situated at the distance r from the center of the antenna in the direction defined by polar angles xcfx86 and xcex8, and each sensor of the antenna, as calculated at the frequency f;
Dr(f) is the directivity matrix which characterizes the spatial selectivity properties of the antenna at distance r;
W(f, xcfx86, xcex8) are spatial weights that enable waves coming from the loudspeaker directions to be attenuated to a greater extent in order to reduce direct coupling with the loudspeakers;
w(f) is a vector of complex weights for the output signals from the sensors at the frequency f;
C(f) is a xe2x80x9cconstraintxe2x80x9d matrix containing the theoretical propagation vectors calculated on the basis of a free field type propagation model or as measured under free field conditions;
s(f) are the desired complex gains in the given directions;
M(f) is a coupling constraint matrix containing the complex transfer functions as measured in-situ of the sound channels defined by the inputs to the loudspeakers of the sound playback device and by the outputs from the sensors making up the sound pickup device, referred to as xe2x80x9csound couplingxe2x80x9d channels; and
0 is a zero vector.
Thus, the invention takes account of constraints on direct coupling when calculating channel formation. The channel-forming calculation is not performed solely on the basis of theoretical propagation models or of measurements performed in a quiet room, but also on the basis of estimated transfer functions obtained on site. Thus, the invention makes it possible to reduce direct and/or semi-direct coupling while imposing desired directivity and while controlling the maximum amplification of incoherent noise.
The invention applies in particular to sound pickup devices for group video conferences.
The invention has the advantage of taking account automatically of the near acoustic environment (shape of the TV monitor and of the box containing the antenna, position of the box relative to the monitor, the presence of an obstacle or a wall nearby), the response of the loudspeakers, and the mismatch between the pickups. In addition, since optimization is performed blind, the method of the invention does not require prior assumptions to be made concerning knowledge about the positions of the loudspeakers, and concerning the electro-acoustic characteristics of the loudspeakers and of the sensors.