A device may have audio input means such as a microphone that can be used to receive audio signals from the surrounding environment. For example, a microphone of a user device may receive a primary audio signal (such as speech from a user) as well as other audio signals. The other audio signals may be interfering audio signals received at the microphone of the device, and may be received from an interfering source or may be ambient background noise or microphone self-noise. The interfering audio signals may disturb the primary audio signals received at the device. The device may use the received audio signals for many different purposes. For example, where the received audio signals are speech signals received from a user, the speech signals may be processed by the device for use in a communication event, e.g. by transmitting the speech signals over a network to another device which may be associated with another user of the communication event. Alternatively, or additionally, the received audio signals could be used for other purposes, as is known in the art.
In order to improve the quality of the received audio signals, (e.g. the speech signals received from a user for use in a call), it is desirable to suppress interfering audio signals (e.g. background noise and interfering audio signals received from interfering audio sources) that are received at the microphone of the user device.
The use of stereo microphones and other microphone arrays in which a plurality of microphones operate as a single audio input means is becoming more common. The use of a plurality of microphones at a device enables the use of extracted spatial information from the received audio signals in addition to information that can be extracted from an audio signal received by a single microphone. When using such devices one approach for suppressing interfering audio signals is to apply a beamformer to the audio signals received by the plurality of microphones. Beamforming is a process of focusing the audio signals received by a microphone array by applying signal processing to enhance particular audio signals received at the microphone array from one or more desired locations (i.e. directions and distances) compared to the rest of the audio signals received at the microphone array. For simplicity we will describe the case with only a single desired direction herein, but the same method will apply when there are more directions of interest. The angle (and/or the distance) from which the desired audio signal is received at the microphone array, so-called Direction of Arrival (“DOA”) information, can be determined or set prior to the beamforming process. It can be advantageous to set the desired direction of arrival to be fixed since the estimation of the direction of arrival may be complex. However, in alternative situations it can be advantageous to adapt the desired direction of arrival to changing conditions, and so it may be advantageous to perform the estimation of the desired direction of arrival in real-time as the beamformer is used. Adaptive beamformers apply a number of weights (or “beamformer coefficients”) to the received audio signals.
These weights can be adapted to take into account the DOA information to process the audio signals received by the plurality of microphones to form a “beam” whereby a high gain is applied to desired audio signals received by the microphones from a desired location (i.e. a desired direction and distance) and a low gain is applied in the directions to any other (e.g. interfering) signal sources. The beamformer may also be “adaptive” in the sense that the suppression of interfering sources can be adapted, the selection of the desired source/look direction may not necessarily be adaptable.
As well as having a plurality of microphones for receiving audio signals, a device may also have audio output means (e.g. comprising a loudspeaker) for outputting audio signals. Such a device is useful, for example where audio signals are to be outputted to, and received from, a user of the device, for example during a communication event. For example, the device may be a user device such as a telephone, computer or television and may include equipment necessary to allow the user to engage in teleconferencing.
Where a device includes both audio output means (e.g. including a loudspeaker) and audio input means (e.g. microphones) then there is often a problem when an echo is present in the received audio signals, wherein the echo results from audio signals being output from the loudspeaker and received at the microphones. An echo canceller may be used to cancel the echo in the audio signals received at the microphones. Echo suppression and echo subtraction are two methods of implementing an echo canceller. For example, an echo canceller may implement an echo suppressor which is used to suppress the echo in the audio signals received at the microphones. The path of propagation of an audio signal from the loudspeaker to the microphone is known as the echo path, and an echo suppressor may estimate the echo path gain as a function of time and frequency and use this to estimate the echo power in the received audio signals. The estimate of the echo power in the received audio signals can be used to suppress the echo in the received audio signals to a level such that they are not noticeable. The estimation of the echo power in the received audio signals is based on a model of the loudspeaker-enclosure-microphone system in which the echo canceller is operating. The model is often, at least partly, linear, but in some cases the model may be non-linear. A hybrid echo canceller consists of an echo subtractor and an echo suppressor applied in a cascaded manner. By using a hybrid echo canceller, increased doubletalk transparency is achieved by the echo subtractor, and if needed an additional echo suppression gain is achieved by the echo suppressor.
Common requirements for optimum operation of the echo cancellation are that:                The echo path is relatively slowly varying since otherwise the echo path gain estimate would rapidly be inaccurate;        The system is sufficiently linear to be modelled by a linear echo model; and        The echo path gain should not be underestimated, since underestimation would in turn also cause the echo power to be underestimated. This would cause the echo canceller to apply too little suppression and thereby pass through residual echoes that are non-negligible.        
It is not a trivial task to implement both a beamformer and an echo canceller on received audio signals. Indeed, when incorporating an adaptive microphone beamformer (e.g. in a teleconferencing application) care needs to be taken so that the echo canceller performance is not reduced by the adaptive behavior of the beamformer.
In a first system implementing beamforming and echo cancellation together, a separate echo canceller is applied for each microphone signal before the beamforming is performed. However, this first system is very computationally complex due the operation of multiple echo cancellers for the multiple microphone signals. Furthermore, the use of echo cancellers on the microphone signals may disturb the beamforming process of the beamformer.
In a second system implementing beamforming and echo cancellation together, an echo canceller is applied to the output of the beamformer. In this second system the behavior of the data-adaptive beamformer is preferably constrained to be changing very slowly over time, since otherwise the estimates of the echo path used in the echo canceller will be detrimentally affected as the echo canceller attempts to adjust the echo path estimates in response to the changes in the beamformer behavior. Furthermore in this second system the beamformer is preferably constrained to be linear and slowly varying in order to prevent a detrimental reduction in the achievable echo cancellation performance. Some beamformers are linear, but some are not linear, so the choice of beamformers is restricted (to being a linear beamformer) in the second system.
Therefore, there are problems with both the first system and the second system described above.
Furthermore, when applying beamformers in combination with an acoustic echo canceller (AEC), the last applied one needs to take the other into account for achieving the best performance. When internal information from the beamformer is available, there are ways to do a deep integration where one module essentially does both AEC and beamforming.
When no internal information is available, on the other hand, it becomes harder to accurately compensate in the AEC for the echo attenuation done by the beamformer.