The present invention relates to digital techniques for processing speech signals. It relates more particularly to the techniques utilizing voice activity detection so as to perform different processings depending on whether the signal does or does not carry voice activity.
The digital techniques in question come under varied domains: coding of speech for transmission or storage, speech recognition, noise reduction, echo cancellation, etc.
The main difficulty with processes for detecting voice activity is that of distinguishing between voice activity and the noise which accompanies the speech signal.
The document WO99/14737 describes a method of detecting voice activity in a digital speech signal processed on the basis of successive frames and in which an a priori denoising of the speech signal of each frame is carried out on the basis of noise estimates obtained during the processing of one or more previous frames, and the variations in the energy of the a priori denoised signal are analyzed so as to detect a degree of voice activity of the frame. By carrying out the detection of voice activity on the basis of an a priori denoised signal, the performance of this detection is substantially improved when the surrounding noise is relatively strong.
In the methods customarily used to detect voice activity, the energy variations of the (direct or denoised) signal are analyzed with respect to a long-term average of the energy of this signal, a relative increase in the instantaneous energy suggesting the appearance of voice activity.
An aim of the present invention is to propose another type of analysis allowing voice activity detection which is robust to the noise which may accompany the speech signal.