A device may have input means that can be used to receive transmitted signals from the surrounding environment. For example, a device may have audio input means such as a microphone that can be used to receive audio signals from the surrounding environment. For example, a microphone of a user device may receive a primary audio signal (such as speech from a user) as well as other audio signals. The other audio signals may be interfering audio signals received at the microphone of the device, and may be received from an interfering source or may be ambient background noise or microphone self-noise. The interfering audio signals may disturb the primary audio signals received at the device. The device may use the received audio signals for many different purposes. For example, where the received audio signals are speech signals received from a user, the speech signals may be processed by the device for use in a communication event, e.g. by transmitting the speech signals over a network to another device which may be associated with another user of the communication event. Alternatively, or additionally, the received audio signals could be used for other purposes, as is known in the art.
In other examples, a device may have receiving means for receiving other types of transmitted signals, such as radar signals, sonar signals, antenna signals, radio waves, microwaves and general broadband signals or narrowband signals. The same situations can occur for these other types of transmitted signals whereby a primary signal is received as well as interfering signals at the receiving means. The description below is provided mainly in relation to the receipt of audio signals at a device, but the same principles will apply for the receipt of other types of transmitted signals at a device, such as general broadband signals, general narrowband signals, radar signals, sonar signals, antenna signals, radio waves and microwaves as described above.
In order to improve the quality of the received audio signals, (e.g. the speech signals received from a user for use in a call), it is desirable to suppress interfering audio signals (e.g. background noise and interfering audio signals received from interfering audio sources) that are received at the microphone of the user device.
The use of stereo microphones and other microphone arrays in which a plurality of microphones operate as a single audio input means is becoming more common. The use of a plurality of microphones at a device enables the use of extracted spatial information from the received audio signals in addition to information that can be extracted from an audio signal received by a single microphone. When using such devices one approach for suppressing interfering audio signals is to apply a beamformer to the audio signals received by the plurality of microphones. Beamforming is a process of focussing the audio signals received by a microphone array by applying signal processing to enhance particular audio signals received at the microphone array from one or more desired locations (i.e. directions and distances) compared to the rest of the audio signals received at the microphone array. For simplicity we will describe the case with only a single desired direction herein, but the same method will apply when there are more directions of interest. The angle (and/or the distance) from which the desired audio signal is received at the microphone array, so-called Direction of Arrival (“DOA”) information can be determined or set prior to the beamforming process. It can be advantageous to set the desired direction of arrival to be fixed since the estimation of the direction of arrival may be complex. However, in alternative situations it can be advantageous to adapt the desired direction of arrival to changing conditions, and so it may be advantageous to perform the estimation of the desired direction of arrival in real-time as the beamformer is used. Adaptive beamformers apply a number of weights (or “beamformer coefficients”) to the received audio signals. These weights can be adapted to take into account the DOA information to process the audio signals received by the plurality of microphones to form a “beam” whereby a high gain is applied to desired audio signals received by the microphones from a desired location (i.e. a desired direction and distance) and a low gain is applied in the directions to any other (e.g. interfering) signal sources.
The output of the beamformer can be further processed in the device in the same way as a received audio signal from a single microphone may be processed, e.g. for transmission to another device as part of a communication event. For example, the output of the beamformer may be supplied as an input signal to at least one of an Automatic Gain Control (AGC) processing stage and a single channel noise reduction stage in the device.
The Minimum Variance Distortionless Response (MVDR) beamformer, also known as the Capon beamformer, is part of a class of beamformers that adapt the beamformer coefficients applied to the audio signals to thereby minimize the energy of the output signal based on the input signals under a constraint of not distorting the primary audio signals received with a principal direction of arrival at the device (i.e. audio signals received from the direction of focus of the beamformer). The beamformer weights can be calculated using the inverse of a covariance matrix of the audio signals received at the microphone array. The covariance matrix provides an indication of the correlation of the audio signals received at the different microphones of the microphone array when different delays are applied to the received audio signals (corresponding to different directions of arrival of audio signals at the microphone array). In particular, the cross-covariance indicates the direction of arrival for which the audio signals received at the plurality of microphones are most highly correlated, and this may be taken as the principal direction of arrival of the primary audio signals. Although the beamformer adaptation based on the input signals is useful in minimizing the energy of the output signal, adapting the beamformer in this way tends to distort the output signals as it varies over time. In particular, any irregularities or sudden changes to the beamformer can result in audible distortion of the output from the beamformer (as the beampattern of the beamformer is suddenly changed).
Spectral regularization may be applied to the received audio signals prior to calculating the covariance matrix. For example, white noise may be added to the audio signals by adding a scaled identity matrix to the covariance matrix in order to lower the condition number of the matrix before inversion for avoiding the problem of inverting a potentially ill-conditioned matrix.
Furthermore, injection of white noise in the covariance matrix can be used to ensure that energy is never amplified by the beamformer, and to ensure that ambient noise suppression is achieved, at the cost of attenuation of the stronger interfering sources. This method provides one way to regularize the adaptation of the beamformer to achieve slightly better beamformer behaviour, but it does not provide sufficient means to prevent distortion in the beamformer output.