Mobile phones can be used in acoustically different ambient environments, where the user's voice (speech) that is picked up during a phone call or during a recording session is usually mixed with a variety of types and levels of other undesirable sounds (including ambient sounds and the voice of another talker.) These undesirable sounds (also referred to as noise) are often picked up on the microphone(s) and thus often degrade the acquisition of the desired speech. For example, pickup of such undesirable sounds can reduce speech intelligibility of the user's speech as heard at the far-end of a phone call. Pickup of such undesirable sounds can also lead to significant voice distortion particularly after having been processed by voice coders in a cellular communication network. For at least these reasons, it is typically desirable to apply a high quality, digital noise suppression process to the mixture of speech and noise of the acquired audio signal, before passing the signal to next steps in its transmission to the far-end, e.g. passing the signal to a cell voice coder in a baseband communications chip of the mobile phone.
In the handset mode of operation (against the ear) in some current mobile phones, audio signals from more than one microphone can be used together in a multiple (e.g. two)-microphone noise suppression process. The general approach relies on the fact that some microphones, or combination of some microphones, can be used more effectively than others to estimate either the desired speech or the unwanted noise components. Such estimates help in the noise suppression process. In some cell-phones the microphones or combination of microphones is clear, e.g. microphones closer to the user's mouth would have a higher signal to noise ratio (SNR) than those further away, the signal being the desired speech. SNRs can also be tested or computed, a-priori, during the design process. This could be done by either measuring with known noise or estimating with unknown noise a stationary noise spectrum for the microphone signal and then further estimating spectrums of the desired speech when such speech is active. The ratio of two spectrums is used to estimate the SNR. The microphone signal having the largest SNR is then selected to be the voice dominant input of the two microphone NS process. Conversely, the microphone having the lower SNR can be used to better estimate or predict the noise spectrum, both stationary and dynamic.