A device may have input means that can be used to receive transmitted signals from the surrounding environment. For example, a device may have audio input means such as a microphone that can be used to receive audio signals from the surrounding environment. For example, a microphone of a user device may receive a primary audio signal (such as speech from a user) as well as other audio signals. The other audio signals may be interfering audio signals received at the microphone of the device, and may be received from an interfering source or may be ambient background noise or microphone self-noise. The interfering audio signals may disturb the primary audio signals received at the device. The device may use the received audio signals for many different purposes. For example, where the received audio signals are speech signals received from a user, the speech signals may be processed by the device for use in a communication event, e.g. by transmitting the speech signals over a network to another device which may be associated with another user of the communication event. Alternatively, or additionally, the received audio signals could be used for other purposes, as is known in the art.
In other examples, a device may have receiving means for receiving other types of transmitted signals, such as general broadband signals, general narrowband signals, radar signals, sonar signals, antenna signals, radio waves or microwaves. The same situations can occur for these other types of transmitted signals whereby a primary signal is received as well as interfering signals at the receiving means. The description below is provided mainly in relation to the receipt of audio signals at a device, but the same principles will apply for the receipt of other types of transmitted signals at a device, such as general broadband, general narrowband signals, radar signals, sonar signals, antenna signals, radio waves and microwaves as described above.
In order to improve the quality of the received audio signals, (e.g. the speech signals received from a user for use in a call), it is desirable to suppress interfering audio signals (e.g. background noise and interfering audio signals received from interfering audio sources) that are received at the microphone of the user device.
The use of stereo microphones and other microphone arrays in which a plurality of microphones operate as a single audio input means is becoming more common. The use of a plurality of microphones at a device enables the use of extracted spatial information from the received audio signals in addition to information that can be extracted from an audio signal received by a single microphone. When using such devices one approach for suppressing interfering audio signals is to apply a beamformer to the audio signals received by the plurality of microphones. Beamforming is a process of focussing the audio signals received by a microphone array by applying signal processing to enhance particular audio signals received at the microphone array from one or more desired locations (i.e. directions and distances) compared to the rest of the audio signals received at the microphone array. For simplicity we will describe the case with only a single desired direction herein, but the same method will apply when there are more directions of interest. As is known in the art, the problem of solving for multiple desired directions of arrival at the device may not be trivial as the number of desired directions increases, and for large numbers of desired directions it may not be possible to determine all of the desired directions of arrival. However, the embodiments of the present invention described herein are not limited only to situations in which the directions of arrival of particular audio signals can be determined, but can also be applied even if the locations of the interfering sources cannot be uniquely determined. The angle (and/or distance) from which the desired audio signal is received at the microphone array, so-called Direction of Arrival (“DOA”) information can be determined or set prior to the beamforming process. It can be advantageous to set the desired direction of arrival to be fixed since the estimation of the direction of arrival may be complex. However, in alternative situations it can be advantageous to adapt the desired direction of arrival to changing conditions, and so it may be advantageous to perform the estimation of the desired direction of arrival in real-time as the beamformer is used. It is also possible to estimate only the signal delays corresponding to particular directions (and possibly also distances) of arrival which we in the following also will denote as DOA information. Adaptive beamformers update their time varying filter coefficients in a way that incorporates the DOA information. This is done such that when processing the audio signals received by the plurality of microphones a “beam” is formed whereby a high gain is applied to desired audio signals received by the microphones from a desired location (i.e. a desired direction and distance) and a low gain is applied in the directions to any other (e.g. interfering) signal sources.
The output of the beamformer can be further processed in the device in the same way as a received audio signal from a single microphone may be processed, e.g. for transmission to another device as part of a communication event. For example, the output of the beamformer may be supplied as an input signal to at least one of an Acoustic Echo Cancellation (AEC) stage, an Automatic Gain Control (AGC) processing stage and a single channel noise reduction stage in the device.
Data-adaptive beamformers usually compute the coefficients based on averaged statistics of the received audio signals. The averaged statistics of the received audio signals enable the beamformer coefficients to be adapted to the received audio signals such that the beamformer has particular characteristics. For example, the averaged statistics may comprise an averaged covariance matrix of the received audio signals at the microphones. The covariance matrix can be used in order to compute the beamformer coefficients such that the beamformer has particular characteristics. For example, the Minimum Variance Distortionless Response (MVDR) beamformer, also known as the Capon beamformer, is a beamformer that adapts the beamformer coefficients applied to the audio signals to minimize the energy of the output signal based on the input signals under a constraint of not distorting the primary audio signals received with a principal direction of arrival at the device (i.e. audio signals received from the direction of focus of the beamformer). However, the MVDR beamformer tends to distort sound that is arriving from directions other than the principal direction of arrival at the device.
Where the beamformer coefficients are computed based on an averaged covariance matrix of the received audio signals, a scaled identity matrix may be added to the covariance matrix in order to control the condition number of the covariance matrix before inverting it and using it to compute the coefficients of the beamformer. The identity matrix can be interpreted as corresponding to a covariance matrix that is obtained on average when injecting spatially and temporally white noise as an artificial source data in the sensor data for the received audio signals.