Wind buffeting noise is created by the action of wind across the surface of a microphone or other receiver device. Such turbulent air flow causes' local pressure fluctuations and sometimes even saturates the microphone. This can make it difficult for the microphone to detect a desired signal. The time-varying wind noise created under such situations is commonly referred to as “buffeting”. Wind buffeting noise in embedded microphones, such as those found in cell phones, Bluetooth headsets, and hearing aids, is known to produce major acoustic interference and can severely degrade the quality of an acoustic signal.
Wind buffeting mitigation has been a very difficult problem to tackle effectively. Commonly, mechanical-based solutions have been implemented. For example, in WO 2007/132176 the plurality of transducer elements in the communication device are covered by a thin acoustic resistive material. However, mechanical-based solutions are not always practical or feasible in every situation.
Voice communications systems have traditionally used single-microphone noise reduction (NR) algorithms to suppress noise and improve the audio quality. Such algorithms, which depend on statistical differences between speech and noise, provide effective suppression of stationary (i.e. non time varying) noise, particularly where the signal to noise ratio (SNR) is moderate to high. However, the algorithms are less effective where the SNR is very low and the noise is dynamic (or non-stationary), e.g. wind buffeting noise. Special single microphone wind noise reduction algorithms have been proposed in “Coherent Modulation Comb Filtering for Enhancing Speech in Wind Noise,” by Brian King and Les Atlas, “Wind Noise Reduction Using Non-negative Sparse Coding,” by Mikkel N Schmidt, Jan Larsen and Fu-Tien Hsaio, and US 2007/0030989. When the wind noise is severe, single channel systems generally either resort to total attenuation of the incoming signal or completely cease to process the incoming signal.
The limitation imposed on the single channel solutions can be mitigated when multiple microphones are available. As wind buffeting noise is caused by local turbulence surrounding microphones, the wind noise observed by one microphone generally occupies a different time-frequency space to wind noise observed by another microphone. Therefore, the correlation between the wind buffeting noise components received at the two microphones is generally low. In contrast, when there is no wind buffeting, two microphones that are closely spaced are subject to the same acoustic field and thus the acoustic signals (speech, music, or background noise) observed by the microphones are typically highly correlated. Many algorithms such as those disclosed in U.S. Pat. No. 7,464,029 and US 2004/0165736 have taken advantage of this by switching to the one of the two microphones that has the lower power at any given time to mitigate the impact of wind buffeting noise.
In addition to handling wind buffeting noise, there are many approaches directed to how to use multiple microphones to mitigate the negative impacts of acoustic noise in an environment on a received signal. These algorithms can be categorized into blind source separation (BSS) and independent component analysis (ICA), beamforming, coherence based filtering, direction of arrival filtering techniques and various combinations thereof. The following is a brief overview of each type of technique.
BSS/ICA
Blind source separation (BSS) refers to techniques that estimate original source signals using only the information of the received mixed signals. Some examples of how BSS techniques can be used to mitigate wind noise are illustrated in U.S. Pat. No. 7,464,029, in “Blind Source Separation combining Frequency-Domain ICA and Beamforming”, by H. Saruwatari, S. Kurita, and K. Takeda and in US 2009/0271187. BSS is a statistical technique that is used to estimate a set of linear filter coefficients for applying to a received signal. When using BSS, it is assumed that the original noise sources are statistically independent and so there is no correlation between them. Independent component analysis (ICA) is another statistical technique used to separate sound signals from noise sources. ICA can therefore be used in combination with BSS to solve the BSS statistical problem. BSS/ICA based techniques can achieve a substantial amount of noise reduction when the original sources are independent.
However, in real-life scenarios, there will often be reverberations and echoes of particular signals in the environment that are detected by the microphones. Therefore some noise signals may have some correlation. Also, BSS/ICA techniques commonly require that there are as many microphones as signal sources in order that the statistical problem can be solved accurately. In practice, however, there are often more signal sources than microphones. This causes the formation of an under-deterministic set of equations to solve and can negatively impact the separation performance of the BSS/ICA algorithms. Problems such as source permutation and temporarily active sources also pose challenges to the robustness of BSS/ICA algorithms. Furthermore, since BSS/ICA algorithms rely on statistical assumptions to estimate the required de-mixing transformation for separating the signals, the presence of incoherent noise such as local wind turbulence often makes the required de-mixing transformation time-varying and thus hard to estimate. When the incoherent noise is strong, the calculated filter coefficients can diverge. Therefore, the algorithms' ability to separate other coherent signals is hampered.
Beamforming
Beamforming is another widely used multi-microphone noise suppression technique. The basics of the technique are described in “Beamforming: A versatile Approach to Spatial Filtering” by B. D. Van Veen and Kevin Buckley. Like BSS/ICA, beamforming is a statistical technique. Beamforming techniques rely on the assumption that the unwanted noise components are unlikely to be originating from the same direction as the desired signal. Therefore, by imposing several spatial constraints, the desired signal source can be targeted and the signal to noise ratio (SNR) can be improved. The spatial constraints may be implemented in several different ways. Typically, however, an array of microphones is configured to receive a signal. Each microphone is sampled and a desired spatial selectivity is achieved by combining the sampled microphone signals together. The sampled microphone signals can be combined together either with an equal weighting or with an unequal weighting. The simplest type of beamformer is a delay-and-sum beamformer. In a delay-and-sum beamformer, the signal received at each microphone is delayed for a time t before being summed together in a signal processor. The delay shifts the phase of the signal received at that microphone so that when each contribution is summed, the summed signal has a strong directional component. In this example, each received signal is given an equal weight. In the simplest case, the model assumes a scenario in which each microphone receives the same signal and there is no correlation between the noise signals. More complex beamformers can be developed by assigning different weights to each received signal. For delay-and-sum beamformers, the microphone array gain, which is a performance measurement that represents the ratio of the SNR at the output of the array to the average SNR of the microphone signals, depends on the number of microphones.
The performance of beamforming algorithms is limited when the number of microphones in the array is small or when the distance between microphones is short relative to the wavelength of signal in the intended frequency range. This later condition is frequently true for applications such as Bluetooth headsets. Therefore, the use of beamforming algorithms is not commonly used in Bluetooth headsets.
Coherence-Based Approach
Coherence-based techniques are another subclass of microphone array signal processing using multiple microphones.
If the signals captured by the two microphones are denoted as x1(n) and x2(n) in the time domain, the coherence function between the two signals at frequency bin k is defined as:
                              Coh          ⁡                      (            k            )                          =                                                                          E                ⁢                                  {                                                                                    X                        1                                            ⁡                                              (                        k                        )                                                              ⁢                                          X                      2                                        *                                          (                      k                      )                                                        }                                                                    2                                E            ⁢                          {                                                                                                            X                      1                                        ⁡                                          (                      k                      )                                                                                        2                            }                        ⁢            E            ⁢                          {                                                                                                            X                      2                                        ⁡                                          (                      k                      )                                                                                        2                            }                                                          (        1        )            where E{ } denotes expectation value, * denotes complex conjugate. Xi(k) is the frequency-domain representation of xi(n) at frequency bin k and is assumed to be zero-mean. The value of coherence function ranges between 0 and 1, with 1 indicating full coherence and 0 indicating no correlation between the two signals.
The coherence function is often referred to as the magnitude squared coherence (MSC) function. The MSC function has been used both by itself alone and in combination with a beamformer (see “A Two-Sensor Noise Reduction System: Applications for Hands-Free Car Kit”, by A. Guérin, R. L. Bouquin-Jeannés and G. Faucon and “Digital Speech Transmission: Enhancement, Coding and Error Concealment,” by P. Vary and D. R. Martin). The MSC function has been used in two-microphone applications. The MSC function works on two main assumptions: Firstly, that the target speech signals are directional and thus there is a high coherence between the target speech signals received at different microphones. Secondly, that the noise signals are diffuse and thus have lower coherence between microphones than between the target speech signals. However, such an assumption has many limitations. For example, in modelling ambient noise, with the assumption of an ideal diffuse noise field, the coherence function, i.e. MSC, can be expressed using a sin c function:
                                          Coh            ⁡                          (              Ω              )                                =                                                    sin                2                            ⁡                              (                                  Ω                  ⁢                                                                          ⁢                                      f                    s                                    ⁢                                      ⅆ                                          /                      c                                                                      )                                                                    (                                  Ω                  ⁢                                                                          ⁢                                      f                    s                                    ⁢                                      ⅆ                                          /                      c                                                                      )                            2                                      ⁢                                  ⁢        where        ⁢                                  ⁢                              Ω            =                                          2                ⁢                                                                  ⁢                π                ⁢                                                                  ⁢                f                                            f                s                                              ,                                    (        2        )            d, c, and fs denote the distance between the omni-directional microphones, the speed of sound, and the sampling rate, respectively.
The coherence function of the ideal diffuse sound field attains its first zero at
      f    c    =            c              2        ⁢                                  ⁢        d              .  Above this frequency fc, the function value, i.e. the coherence, is low. For a typical Bluetooth headset, the microphones are separated by a distance of 2.5 cm. In such a case, fc can be calculated to be 6860 Hz. Therefore, for this typical Bluetooth headset, even perfectly diffuse noise exhibits a high coherence and thus the coherence function is ineffective for distinguishing speech from acoustic noise from far field.Filtering Based on Direction-of-Arrival
Direction-of-arrival (DOA) based filtering relies on the ability of the receiver to estimate the origin of a target signal. DOA estimation of a sound source by using microphone arrays has previously been applied to tackle speech enhancement problems. Examples of particular applications are illustrated in “Microphone Array for Headset with Spatial Noise Suppressor,” by A. A. Ivan Tashev and Michael L. Seltzer, and “Noise Crosee PSD Estimation Using Phase Information in Diffuse Noise Field,” by M. Rahmani, A. Akbari, B. Ayad and B. Lithogow. The fundamental principle behind DOA estimation is to capture the phase information present in signals picked up by the array of microphones. The phase difference is zero when the incoming signal impinges from the broadside direction, and largest when the microphones are in end-fire orientation. The phase difference is often estimated through the so called phase transform (PHAT). PHAT normalises the cross-spectrum by the total magnitude of the cross-spectrum.
In practice, it is difficult to accurately estimate the phase of a received signal due to reverberation, quantisation and hardware limitations of the receiver. Also, systems that filter based on the DOA estimate can be ineffective in cancelling noise signals that originate from the same direction as the target signal. Therefore, when the target signal is from the broadside direction, i.e., zero phase difference, the array is also limited in reducing diffuse noise.
Hybrid Approach
Realizing the limitations of various multi-microphone noise suppression approaches, hybrid systems have also been proposed. In “Blind Source Separation combining Frequency-Domain ICA and Beamforming”, by H. Saruwatari, S. Kurita, and K. Takeda, a subband BSS/ICA system is combined with a null beamformer. The selection of the de-mixing matrices used in BSS/ICA is selected based on the estimated DOA of the undesired sound source. Such an approach may have problems in practice when the input signals have a random phase distribution, such as wind noise. The ICA would fail to converge due to the sporadic and highly incoherent nature of wind noise. In “Microphone Array for Headset with Spatial Noise Suppressor,” by A. A. Ivan Tashev and Michael L. Seltzer, a second hybrid algorithm is described. This second hybrid algorithm consists of a three stage processing chain: a fixed beamformer, a spatial noise suppressor for removing directional noise sources and a single-channel adaptive noise reduction module designed to remove any residual ambient or instrumental stationary noise. Both the beamformer and the spatial noise suppressor are designed to remove from the signal noise components that arrive from directions other than the main signal direction. Therefore, this system may experience difficulties in suppressing noise when the noise signal is in the target signal direction. This might be true for non-stationary noise sources, such as wind, music and interfering speech signals.
From the discussion above, most of these approaches have limited capability handling wind buffeting noise, and their capabilities of reducing acoustic noise are greatly hampered when wind buffeting exists. Out of the techniques that can reduce wind buffeting noise, their capability in reducing acoustic noise would be seriously compromised by reducing wind buffeting noise.
There is therefore a need for a system for mitigating the effect of wind buffeting noise.