Communication devices such as cellular mobile phones and desktop or laptop computers that are running telephony applications allow their users to conduct a conversation through a two-way, real-time voice or video telephony session that is taking place in near-end and far-end devices that are coupled to each other through a communication network. An audio signal that contains the speech of a near-end user that has been picked up by a microphone is transmitted to the far-end user's device, while, at the same time, an audio signal that contains the speech of the far-end user is being received at the near-end user's device. But the quality and intelligibility of the speech reproduced from the audio signal is degraded due to several factors. For instance, as one participant speaks, the microphone will also pick up other environmental sounds (e.g., ambient noise). These sounds are sent along with the participant's voice, and when heard by the other participant the voice may be muffled or unintelligible as a result. Sounds of other people (e.g., in the background) may also be transmitted and heard by the other participant. Hearing several people talking at the same time may confuse and frustrate the other participant that is trying to engage in one conversation at a time.
Speech enhancement using spectral shaping, acoustic echo cancellation, noise reduction, blind source separation and pickup beamforming (audio processing algorithms) are commonly used to improve speech quality and intelligibility in telephony devices such as mobile phones. Enhancement systems typically operate, for example in a far-end device, by estimating the unwanted background signal (e.g., diffuse noise, interfering speech, etc.) in a noisy microphone signal captured by the far-end device. The unwanted signal is then electronically cancelled or suppressed, leaving only the desired voice signal to be transmitted to the near-end device.
In an ideal system, speech enhancement algorithms perform well in all scenarios and provide increased speech quality and speech intelligibility. In practice, however, the success of enhancement systems varies depending on several factors, including the physical hardware of the device (e.g., number of microphones), the acoustic environment during the communication session, and how a mobile device is carried or being held by its user. Enhancement algorithms typically require design tradeoffs between noise reduction, speech distortion, and hardware cost (e.g., more noise reduction can be achieved at the expense of speech distortion).