The proliferation of smart phones, tablets, and other mobile devices has fundamentally changed the way people access information and communicate. People now make phone calls in diverse places such as crowded bars, busy city streets, and windy outdoors, where adverse acoustic conditions pose severe challenges to the quality of voice communication. Additionally, voice commands have become an important method for interaction with electronic devices in applications where users have to keep their eyes and hands on the primary task, such as, for example, driving. As electronic devices become increasingly compact, voice command may become the preferred method of interaction with electronic devices. However, despite recent advances in speech technology, recognizing voice in noisy conditions remains difficult. Therefore, mitigating the impact of noise is important to both the quality of voice communication and performance of voice recognition.
Headsets have been a natural extension of telephony terminals and music players as they provide hands-free convenience and privacy when used. Compared to other hands-free options, a headset represents an option in which microphones can be placed at locations near the user's mouth, with constrained geometry among user's mouth and microphones. This results in microphone signals that have better signal-to-noise ratios (SNRs) and are simpler to control when applying multi-microphone based noise reduction. However, when compared to traditional handset usage, headset microphones are relatively remote from the user's mouth. As a result, the headset does not provide the noise shielding effect provided by the user's hand and the bulk of the handset. As headsets have become smaller and lighter in recent years due to the demand for headsets to be subtle and out-of-way, this problem becomes even more challenging.
When a user wears a headset, the user's ear canals are naturally shielded from outside acoustic environment. If a headset provides tight acoustic sealing to the ear canal, a microphone placed inside the ear canal (the internal microphone) would be acoustically isolated from the outside environment such that environmental noise would be significantly attenuated. Additionally, a microphone inside a sealed ear canal is free of wind-buffeting effect. A user's voice can be conducted through various tissues in a user's head to reach the ear canal, because the sound is trapped inside of the ear canal. A signal picked up by the internal microphone should thus have much higher SNR compared to the microphone outside of the user's ear canal (the external microphone).
Internal microphone signals are not free of issues, however. First of all, the body-conducted voice tends to have its high-frequency content severely attenuated and thus has much narrower effective bandwidth compared to voice conducted through air. Furthermore, when the body-conducted voice is sealed inside an ear canal, it forms standing waves inside the ear canal. As a result, the voice picked up by the internal microphone often sounds muffled and reverberant while lacking the natural timbre of the voice picked up by the external microphones. Moreover, effective bandwidth and standing-wave patterns vary significantly across different users and headset fitting conditions. Finally, if a loudspeaker is also located in the same ear canal, sounds made by the loudspeaker would also be picked by the internal microphone. Even with acoustic echo cancellation (AEC), the close coupling between the loudspeaker and internal microphone often leads to severe voice distortion even after AEC.
Other efforts have been attempted in the past to take advantage of the unique characteristics of the internal microphone signal for superior noise reduction performance. However, attaining consistent performance across different users and different usage conditions has remained challenging. It can be particularly challenging to provide robustness and consistency for noise reduction both when the user is speaking and in gaps when the user is not speaking (speech gaps). Some known methods attempt to address this problem; however, those methods may be more effective when the user's speech is present but less so when the user's speech is absent. What is needed is a method that overcomes the drawbacks of the known methods. More specifically, what is needed is a method that improves noise reduction performance during speech gaps such that it is not inconsistent with the noise reduction performance during speech periods.