1. Field
The present disclosure pertains generally to speech processing, and more specifically, to voice activity detection.
2. Background
Voice activity detection (VAD) is a technique used in speech processing wherein the presence or absence of human speech (voice) is detected in portions of an audio signal, which may also contain music, noise, or other sounds. The main uses of VAD are in voice coding and speech recognition. VAD can facilitate speech processing, and can also be used to deactivate some processes during non-speech segments: it can avoid unnecessary coding/transmission of silence, saving on computation and network bandwidth.
VAD is an important enabling technology for a variety of speech-based applications. Customarily, VAD information is usually estimated locally in a single device, such as a communications handset, from an input audio signal.
VAD in a voice communications system should be able to detect voice in the presence of very diverse types of acoustic background noise. One difficulty in the detection of voice in noisy environments is the very low signal-to-noise ratios (SNRs) that are sometimes encountered. In these situations, it is often difficult to distinguish between voice and noise or other sounds using known VAD techniques.