1. Field of the Invention
The present invention relates generally to digital signal processing systems, and more particularly, to a system and method for voice activity detection in adverse environments, e.g., noisy environments.
2. Description of the Related Art
The voice (and more generally acoustic source) activity detection (VAD) is a cornerstone problem in signal processing practice, and often, it has a stronger influence on the overall performance of a system than any other component. Speech coding, multimedia communication (voice and data), speech enhancement in noisy conditions and speech recognition are important applications where a good VAD method or system can substantially increase the performance of the respective system. The role of a VAD method is basically to extract features of an acoustic signal that emphasize differences between speech and noise and then classify them to take a final VAD decision. The variety and the varying nature of speech and background noises makes the VAD problem challenging.
Traditionally, VAD methods use energy criteria such as SNR (signal-to-noise ratio) estimation based on long-term noise estimation, such as disclosed in K. Srinivasan and A. Gersho, Voice activity detection for cellular networks, in Proc. Of the IEEE Speech Coding Workshop, October 1993, pp. 85–86. Improvements proposed use a statistical model of the audio signal and derive the likelihood ratio as disclosed in Y. D. Cho, K Al-Naimi, and A. Kondoz, Improved voice activity detection based on a smoothed statistical likelihood ratio, in Proceedings ICASSP 2001, IEEE Press, or compute the kurtosis as disclosed in R. Goubran, E. Nemer and S. Mahmoud, Snr estimation of speech signals using subbands and fourth-order statistics, IEEE Signal Processing Letters, vol. 6, no. 7, pp. 171–174, July 1999. Alternatively, other VAD methods attempt to extract robust features (e.g. the presence of a pitch, the formant shape, or the cepstrum) and compare them to a speech model. Recently, multiple channel (e.g., multiple microphones or sensors) VAD algorithms have been investigated to take advantage of the extra information provided by the additional sensors.