1. Field of the Invention
The invention relates to the improving of the intelligibility of voice communications in the presence of noise. It applies more especially but not exclusively to telephone or radiotelephone communications or those by other electronic means, to voice recognition, etc. whenever the environment of the sound capture is noisy and might perhaps impair the perception or recognition of the voice transmitted.
2. Discussion of the Background
An example thereof may be given with regard to voice communications inside an aircraft or another noisy vehicle. In the case of an aircraft, noise results from the engines, from the air-conditioning, from the ventilation for the on-board equipment, from aerodynamic noise. All this noise is picked up by the microphone into which the pilot or a crew member is speaking.
The invention proposes a process for searching for a noise model which can serve in particular in noise reduction processing. Noise reduction processing based on the noise model found makes it possible to increase the signal/noise ratio of the signal transmitted, one goal being to impair the intelligibility of the signal as little as possible. In this patent application, the neologisms denoising and denoise will be used to speak of operations aimed at removing or reducing noise components present in the signal.
Denoising may be based as will be seen on the continuous search for an environmental noise model, on the digital spectral analysis of this noise, and on the digital reconstruction of a useful signal which eliminates the modelled noise as far as possible.
The noise model is searched for in the noisy signals themselves and, whenever a plausible noise model has been found, this noise model is stored so as to be able to be used. Then, a new search starts in order to find a more suitable or simply a more recent model.
More precisely, the invention proposes a process for automatically searching for noise models in noisy audio input signals, in which the input signals are digitized, and these signals are processed on the basis of a model found (for example with a view to eliminating as far as possible the noise corresponding to the model), characterized in that the input signals are chopped into successive frames of P samples each, and a repetitive search for a noise model is performed continuously in the input signals themselves, by searching for N successive frames having the expected characteristics of a noise, by storing the Nxc3x97P corresponding samples so as to construct a noise model useful in the denoising processing of the input signals and by iteratively repeating the search so as to find a new noise model and store the new model as replacement for the previous one or retain the previous model according to the respective characteristics of the two models.
Accordingly, the noise model serving in particular for denoising is not a known predetermined model or a model chosen from several predetermined models, but is a model found in the noisy signal itself, this making it possible not only to adapt the denoising to the actual nuisance noise, but also to adapt the denoislng to the variations in this noise.
The noise model is obtained by regarding the signals whose energy is stable (and, preferably, as will be seen, whose energy is a minimum) over a certain duration as probably representing noise; the search for a noise model then comprises the search for N successive frames whose energies are close to one another (N lying between a minimum value N1 and a maximum value N2), the calculation of the average energy of the N successive frames found, and the storing of the Nxc3x97P samples in the guise of new active model if the ratio between this average energy and the average energy of the frames of the active model previously stored is less than a determined replacement threshold.
The search for N successive frames then comprises at least the following iterative steps: calculation of the energy of a current frame of rank n able to be appended to a model undergoing formulation already comprising nxe2x88x921 successive frames; calculation of the ratio between this energy and the energy of the previous frame of rank nxe2x88x921 (and preferably that of other previous frames between 1 and nxe2x88x921); comparison of this ratio with a low threshold less than 1 and a high threshold greater than 1; and decision regarding the possibility of incorporating the frame of rank n into the model undergoing formulation; the frame is not incorporated into the model if the ratio does not lie between the two thresholds; it is incorporated into the model if the ratio does lie between the two thresholds. The procedure is iteratively repeated on the next current frame of the input signals, with incrementation of n, until the halting of the formulation of the model.
The formulation of the model is halted either in the case where n reaches the high value N2, or in the case where the frame of rank n is not incorporated into the model because the calculated energy ratio departs from the prescribed range. In this latter case, the formulated model cannot be taken into account as active model unless nxe2x88x921 is already greater than or equal to the minimum N1, since the principle is that a noise model is representative if it has an almost stable energy over at least N1 frames.
Preferably, the formulated model does not become active in place of the previous model unless the ratio between its average energy per frame and the average energy of the previous model does not exceed a predetermined replacement threshold.
In all cases, the search for a new model restarts as soon as the formulation of the previous one is interrupted.
Finally, preferably, provision may be made for the replacement of a previous model by a new model to be disabled as soon as speech is detected in the noisy signals. The presence of speech can in fact be detected by digital signal processing procedures (such as those which can be used in speech recognition).