1. Field of the Invention
The present invention relates to a method for vectorial noise-reduction in speech and to an implementation device.
2. Description of the Prior Art
Known methods for the reduction of noise in speech use linear filtering. FIG. 1 shows a standard device for the implementation of a method such as this. This device comprises essentially a noise-ridden or noisy signal (speech signal) source 1 that is connected to the+ input of a subtractor 2. A source 3 of noise alone is connected via an adaptive filter 4 to the subtraction input of the subtractor 2. The output of the subtractor 2, which constitutes the output of the noise-reduction device, is furthermore connected to the control input of the filter 4 to send it a residual error signal .epsilon..
The source 3 constitutes a noise model, in the sense of a certain criterion, for example a least mean square criterion, this noise being subtracted adaptively from the noisy signal. The operating principle of this device relies on a postulate according to which the useful signal s, the noise n.sub.o affecting this signal, the noise model n.sub.1 and the output signal y of the filter 4 are in a stationary state and that, furthermore, that there is a decorrelation between s and n.sub.o and between s and n.sub.1, and that there is a high correlation between n.sub.o and n.sub.1.
The output signal is equal to: EQU .epsilon.=s+n.sub.o -y,
that is: EQU .epsilon..sup.2 =s.sup.2 +(n.sub.o -y).sup.2 +2s(n.sub.o -y),
giving, for the power values: EQU E[.epsilon..sup.2 ]=E[s.sup.2 ]+E[(n.sub.o -y).sup.2 ]
Since the output signal is not affected by the adaptive filtering, we have: EQU E min[.epsilon..sup.2 ]=E [S.sup.2 ]+E min[(n.sub.o -y).sup.2 ]
The output of the filter 4 is adjusted so that Emin is minimized. This minimizing of the total output power leads to a reduction of the power of the noise and, consequently, to a maximizing of the signal-to-noise ratio.
At best, the following is obtained: EQU E[(n.sub.o -y).sup.2 ]=0,
giving EQU E.sub.min [.epsilon..sup.2 ]=E[s.sup.2 ]
with: EQU y=n.sub.o and .epsilon.=s
In other words, when the signal of the source 3 is not decorrelated from the signal of the source 1, we have: ##EQU1## and the minimization of the output power forces the adaptive weights of the filter 4 towards zero, and this forces E[y.sup.2 ] towards zero. This approach is well known to those skilled in the art. The adaptive filter 4 is conventionally of the LMS (least mean squares) type or else of the RLS (recursive least squares) type.
The chief defect of this known approach is the overriding need to have this noise source 3 physically available. This source may include a variable proportion of signals that do not have characteristics of noise only. The performance characteristics of the noise-reduction method are then considerably affected by this fact, as is shown by standard theoretical computations which shall not be entered into herein.
A first possible way of overcoming this defect would be to make use of "frequential diversity". This solution consists essentially in processing the noisy signal by DFT (discrete Fourier transform) and, on the basis of its power value, in producing the signal y to be subtracted therefrom by using the reverse discrete Fourier transform of this power value. This processing operation consists in splitting up the useful noisy signal into independent subbands, for example by Fourier analysis, and then in processing each subband independently to increase the size of the vector space of observation. This kind of cutting-up operation cannot be used for speech processing since it is known that the speech signal is not frequentially stationary and does not, statistically, occupy the same frequency bands (which is what happens, for example, with voiced structures).
Another approach would be to use temporal diversity. This approach too is not useful, for the stationary state of the vocal transmission is not physically realistic. At most, it is possible to observe a degree of stationary state on some tens of 25.6 ms frames (corresponding to 256 points of a signal sampled at 10 kHz) for stable vocal cores, but this stationary state would be reduced to a period equal to that of 1 to 3 frames for the plosives (sounds such as "t").
A third approach would be that of "spatial diversity" wherein several signal tapping (vector information tapping) points are distributed in space. The filtering would then be done as shown schematically in FIG. 2.
There is placed, before a speaker, a set 5 of (L+1) microphones which may be, for example, equally spaced out, the output signals of these microphones being referenced x.sub.o, x.sub.1 . . . x.sub.L. Each of these microphones is followed by a narrow-band adaptive filter, the entire set of filters being referenced 6, these filters being respectively referenced W.sub.0, W.sub.1 . . . W.sub.L. Their outputs are connected to a summator 7, the output of which constitutes that of the device.
X.sub.k designates any one of the input vectors, W.sub.k.sup.T the transposed vector of the weight to be applied to the filter and g.sub.k the output scalar.
We have: EQU g.sub.k =X.sup.T.sub.k W.sub.k =W.sub.k.sup.T X.sub.k
with Wk=[W.sub.o k, W.sub.1 k, . . . W.sub.L k].sup.T
At a given instant (determined for example by a sample-and-hold operation), there are L input signals available. The transmission of speech affects all the output signals of the microphones 5, the differences between these signals being due chiefly to the difference in the time of propagation between the speaker and the different microphones. In a manner that is known per se, the spatial processing operation consists in forming an antenna by the formation of conventional channels (generally by linear combinations of the signals of the microphones) to obtain the deflection, by phase-shifting (or by pure delay), of the directional lobe of the antenna thus formed. The limitations mentioned here above for the other known methods remain present.