Voice activity detection (VAD), also known as speech activity detection or speech detection, is a technique used in speech processing in which the presence or absence of human speech is detected. VAD may be used in a variety of applications, including noise suppressors, background noise estimators, adaptive beamformers, dynamic beam steering, always-on voice detection, and conversation-based playback management. Many voice activity detection applications may employ a dual-microphone-based speech enhancement and/or noise reduction algorithm, that may be used, for example, during a voice communication, such as a call. Most traditional dual microphone algorithms assume that an orientation of the array of microphones with respect to a desired source of sound (e.g., a user's mouth) is fixed and known a priori. Such prior knowledge of this array position with respect to the desired sound source may be exploited to preserve a user's speech while reducing interference signals coming from other directions.
Headsets with a dual microphone array may come in a number of different sizes and shapes. Due to the small size of some headsets, such as in-ear fitness headsets, headsets may have limited space in which to place the dual microphone array on an earbud itself. Moreover, placing microphones close to a receiver in the earbud may introduce echo-related problems. Hence, many in-ear headsets often include a microphone placed on a volume control box for the headset and a single microphone-based noise reduction algorithm is used during voice call processing. In this approach, voice quality may suffer when a medium to high level of background noise is present. The use of dual microphones assembled in the volume control box may improve the noise reduction performance. In a fitness-type headset, the control box may frequently move and the control box position with respect to a user's mouth may be at any point in space depending on user preference, user movement, or other factors. For example, in a noisy environment, the user may manually place the control box close to the mouth for increased input signal-to-noise ratio. In such cases, using a dual microphone approach for voice processing in which the microphones are placed in the control box may be a challenging task. As an example, a desired speech direction may not be constant such that user speech may be suppressed in many solutions, including those in which voice processing with beamformers is used.