Voice activity detection (VAD), also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is detected. VAD may be used in a variety of applications, including noise suppressors, background noise estimators, adaptive beamformers, dynamic beam steering, always-on voice detection, and conversation-based playback management. Near-field speech detection is a critical element in many voice-based signal processing algorithms that are used in wearable devices. Due to space restrictions, the microphone spacing in wearable devices is typically small, and conventional near-field detection algorithms may not work well for such microphone arrays. Moreover, due to low power constraints in wearable applications, the use of computationally expensive algorithms such as neural network-based classification methods is prohibitive.
In many speech enhancement or noise reduction algorithms, it is often necessary to detect desired speech signals in the presence of interfering signals in order to achieve the required performance. The interfering signals can range from stationary brown or road noises to dynamic signals such as babble/competing talker noise present in pub or restaurant environments. Conventional voice activity detectors are not capable of distinguishing desired speech signals from speech-like interfering signals. Voice-based signal processing algorithms in conventional approaches typically rely on spatial statistics derived using microphone arrays to detect desired speech signals in the presence of various interfering noise types. Such traditional spatial processing-based detectors have been successfully used in handset and headset devices with large microphone spacing (35-150 mm). However, the performance of these detectors tends to degrade when the microphone spacing is reduced. Due to space limitations, in newer devices, microphones may be closely arranged in wearable devices and the spatial diversity information provided by a closely-spaced microphone array may degrade as the spacing is reduced.