Currently, a number of consumer electronic devices are adapted to receive speech via microphone ports or headsets. While the typical example is a portable telecommunications device (mobile telephone), with the advent of Voice over IP (VoIP), desktop computers, laptop computers and tablet computers may also be used to perform voice communications.
When using these electronic devices, the user also has the option of using headphones, earbuds, or headset to receive his or her speech. However, a common complaint with these hands-free modes of operation is that the speech captured by the microphone port or the headset includes environmental noise such as wind noise, secondary speakers in the background or other background noises. This environmental noise often renders the user's speech unintelligible and thus, degrades the quality of the voice communication.
Noise suppression algorithms are commonly used to enhance speech quality in modern mobile phones, telecommunications, and multimedia systems. Such techniques remove unwanted background noises caused by acoustic environments, electronic system noises, or similar. Noise suppression may greatly enhance the quality of desired speech signals and the overall perceptual performance of communication systems. However, mobile device handset noise reduction performance can vary significantly depending on, for example: 1) the signal-to-noise ratio of the noise compared to the desired speech, 2) directional robustness or the geometry of the microphone placement in the mobile device relative to the unwanted noisy sounds, and 3) handset positional robustness or the geometry of the microphone placement relative to the desired speaker.
Related to multi-channel noise suppression processing is the field blind source separation (BSS). Blind source separation is the task of separating a set of two or more distinct sound sources from a set of mixed signals with little-to-no prior information. Blind source separation algorithms include independent component analysis (ICA), independent vector analysis (IVA), and non-negative matrix factorization (NMF). These methods are designed to be completely general and make no assumptions on microphone position or sound source.
However, blind source separation algorithms have several limitations that limit their real-world applicability. For instance, some algorithms do not operate in real-time, suffer from slow convergence time, exhibit unstable adaptation, and have limited performance for certain sound sources (e.g. diffuse noise) and microphone array geometries. Typical BSS algorithms may also be unaware of what sound sources they are separating, resulting in what is called the external “permutation problem” or the problem of not knowing which output signal corresponds to which sound source. As a result, BSS algorithms can mistakenly output the unwanted noise signal rather than the desired speech.