Mobile devices are frequently used in acoustically harsh environments (i.e. environments where there is a lot of background noise). Aside from problems with a user of the mobile device being able to hear the far-end party during two-way communication, it is difficult to obtain a ‘clean’ (i.e. noise free or substantially noise-reduced) audio signal representing the speech of the user. In environments where the captured signal-to-noise ratio (SNR) is low, traditional speech processing algorithms can only perform a limited amount of noise suppression before the near-end speech signal (i.e. that obtained by the microphone in the mobile device) can become distorted with ‘musical tones’ artifacts.
It is known that audio signals obtained using a contact sensor, such as a bone-conducted (BC) or contact microphone (i.e. a microphone in physical contact with the object producing the sound) are relatively immune to background noise compared to audio signals obtained using an air-conducted (AC) sensor, such as a microphone (i.e. a microphone that is separated from the object producing the sound by air), since the sound vibrations measured by the BC microphone have propagated through the body of the user rather than through the air as with a normal AC microphone, which, in addition to capturing the desired audio signal, also picks up the background noise. Furthermore, the intensity of the audio signals obtained using a BC microphone is generally much higher than that obtained using an AC microphone. Therefore, BC microphones have been considered for use in devices that might be used in noisy environments. FIG. 1 illustrates the high SNR properties of an audio signal obtained using a BC microphone relative to an audio signal obtained using an AC microphone in the same noisy environment.
However, the problem with speech obtained using a BC microphone is that its quality and intelligibility are usually much lower than speech obtained using an AC microphone. This reduction in intelligibility generally results from the filtering properties of bone and tissue, which can severely attenuate the high frequency components of the audio signal.
The quality and intelligibility of the speech obtained using a BC microphone depends on its specific location on the user. The closer the microphone is placed near the larynx and vocal cords around the throat or neck regions, the better the resulting quality and intensity of the BC audio signal. Furthermore, since the BC microphone is in physical contact with the object producing the sound, the resulting signal has a higher SNR compared to an AC audio signal which also picks up background noise.
However, although speech obtained using a BC microphone placed in or around the neck region will have a much higher intensity, the intelligibility of the signal will still be quite low, which is attributed to the filtering of the glottal signal through the bones and soft tissue in and around the neck region and the lack of the vocal tract transfer function.
The characteristics of the audio signal obtained using a BC microphone also depend on the housing of the BC microphone, i.e. is it shielded from background noise in the environment, as well as the pressure applied to the BC microphone to establish contact with the user's body.
Filtering or speech enhancement methods exist that aim to improve the intelligibility of speech obtained from a BC microphone, but these methods require either the presence of a clean speech reference signal in order to construct an equalization filter for application to the audio signal from the BC microphone, or the training of user-specific models using a clean audio signal from an AC microphone. As a result, these methods are not suited to real-world applications where a clean speech reference signal is not always available (for example in noisy environments), or where any of a number of different users can use a particular device.
Therefore, there is a need for an alternative system and method for producing an audio signal representing the speech of a user from an audio signal obtained using a BC microphone that can be used in noisy environments and that does not require the user to train the algorithm before use.