The present invention relates to digital speech signal processing techniques. It relates more particularly to techniques which detect vocal activity to perform different processing according to whether the signal is supporting vocal activity or not.
The digital techniques in question relate to various domains: coding of speech for transmission or storage, speech recognition, noise reduction, echo cancellation, etc.
The main difficulty with vocal activity detection methods is distinguishing vocal activity from the accompanying noise. A conventional noise suppression technique cannot solve this problem because these techniques themselves use estimates of the noise which depend on the degree of vocal activity of the signal.
A main object of the present invention is to make vocal activity detection methods more robust to noise.
The invention therefore proposes a method of detecting vocal activity in a digital speech signal processed by successive frames, in which method the speech signal is subjected to noise suppression taking account of estimates of the noise included in the signal, updated for each frame in a manner dependent on at least one degree of vocal activity determined for said frame. According to the invention, a priori noise suppression is applied to the speech signal of each frame on the basis of estimates of the noise obtained on processing at least one preceding frame, and the energy variations of the a priori noise-suppressed signal are analyzed to detect the degree of vocal activity of said frame.
Detecting vocal activity (as a general rule by any method known in the art) on the basis of a noise-suppressed signal a priori significantly improves the performance of detection if the level of surrounding noise is relatively high.
In the remainder of the present description, the vocal activity detection method of the invention is illustrated within a system for eliminating noise from a speech signal. Clearly the method can find applications in many other types of digital speech processing requiring information on the degree of vocal activity of the processed signal: coding, recognition, echo cancellation, etc.