A device may have audio input means such as a microphone that can be used to receive audio signals from the surrounding environment. For example, a microphone of a user device may receive a primary audio signal (such as speech from a user) as well as other audio signals. The other audio signals may be interfering audio signals received at the microphone of the device, and may be received from an interfering source or may be ambient background noise or microphone self-noise. The interfering audio signals may disturb the primary audio signals received at the device. The device may use the received audio signals for many different purposes. For example, where the received audio signals are speech signals received from a user, the speech signals may be processed by the device for use in a communication event, e.g. by transmitting the speech signals over a network to another device which may be associated with another user of the communication event. Alternatively, or additionally, the received audio signals could be used for other purposes, as is known in the art.
In order to improve the quality of the received audio signals, (e.g. the speech signals received from a user for use in a call), it is desirable to suppress interfering audio signals (e.g. background noise and interfering audio signals received from interfering audio sources) that are received at the microphone of the user device.
The use of stereo microphones and other microphone arrays in which a plurality of microphones operate as a single audio input means is becoming more common. The use of a plurality of microphones at a device enables the use of extracted spatial information from the received audio signals in addition to information that can be extracted from an audio signal received by a single microphone. When using such devices one approach for suppressing interfering audio signals is to apply a beamformer to the audio signals received by the plurality of microphones. Beamforming is a process of focussing the audio signals received by a microphone array by applying signal processing to enhance particular audio signals received at the microphone array from one or more desired locations (i.e. directions and distances) compared to the rest of the audio signals received at the microphone array. For simplicity we will describe the case with only a single desired direction herein, but the same method will apply when there are more directions of interest. The angle (and/or the distance) from which the desired audio signal is received at the microphone array, so-called Direction of Arrival (“DOA”) information, can be determined or set prior to the beamforming process. It can be advantageous to set the desired direction of arrival to be fixed since the estimation of the direction of arrival may be complex. However, in alternative situations it can be advantageous to adapt the desired direction of arrival to changing conditions, and so it may be advantageous to perform the estimation of the desired direction of arrival in real-time as the beamformer is used. Adaptive beamformers apply a number of weights (or “beamformer coefficients”) to the received audio signals.
These weights can be adapted to take into account the DOA information to process the audio signals received by the plurality of microphones to form a “beam” whereby a high gain is applied to desired audio signals received by the microphones from a desired location (i.e. a desired direction and distance) and a low gain is applied in the directions to any other (e.g. interfering) signal sources. The beamformer may be “adaptive” in the sense that the suppression of interfering sources can be adapted, but the selection of the desired source/look direction may not necessarily be adaptable.
As well as having a plurality of microphones for receiving audio signals, a device may also have audio output means (e.g. comprising a loudspeaker) for outputting audio signals. Such a device is useful, for example where audio signals are to be outputted to, and received from, a user of the device, for example during a communication event. For example, the device may be a user device such as a telephone, computer or television and may include equipment necessary to allow the user to engage in teleconferencing.
Where a device includes both audio output means (e.g. including a loudspeaker) and audio input means (e.g. microphones) then there is often a problem when an echo is present in the received audio signals, wherein the echo results from audio signals being output from the loudspeaker and received at the microphones. The audio signals being output from the loudspeaker include echo and also other sounds played by the loudspeaker, such as music or audio, e.g., from a video clip.
When echo is present in audio signals received at a device which implements a beamformer as described above, the echo can be treated as interference in the received audio signals and the beamformer coefficients can be adapted such that the beamformer applies a low gain to the audio signals arriving from the direction (and/or distance) of the echo signals. When a communication event begins, the beamformer has no knowledge of the angle (and/or distance) from which the loudspeaker signal (which includes echo) will arrive until the first instance of far end activity (e.g. speech from the far end user) in the communication event. Once the first instance of far end activity has occurred in the communication event, the device can analyze the audio signals received at the microphones of the device in order to determine the echo direction and can then adapt the beamformer coefficients such that echo suppression is applied by the beamformer to audio signals which are received from the echo direction.
In adaptive beamformers a highly desired property is to have a slowly evolving beampattern. Fast changes to the beampattern tend to cause audible changes in the background noise characteristics, and as such are not perceived as natural. Therefore when adapting the beamformer coefficients in response to the first instance of far end activity in a communication event as described above, there is a trade-off to be made between quickly suppressing the echo, and not changing the beampattern too quickly.