Communication systems allow users to communicate with each other over a network. The network may be, for example, the interne or the Public Switched Telephone Network (PSTN). Audio signals can be transmitted between nodes of the network, to thereby allow users to transmit and receive audio data (such as speech data) to each other in a communication session over the communication system.
A user device may have audio input means such as a microphone that can be used to receive audio signals, such as speech from a user. The user may enter into a communication session with another user, such as a private call (with just two users in the call) or a conference call (with more than two users in the call). The user's speech is received at the microphone, processed and is then transmitted over a network to the other user(s) in the call.
As well as the audio signals from the user, the microphone may also receive other audio signals, such as background noise, which may disturb the audio signals received from the user.
The user device may also have audio output means such as speakers for outputting audio signals to the user that are received over the network from the user(s) during the call. However, the speakers may also be used to output audio signals from other applications which are executed at the user device. For example, the user device may be a TV, which executes an application such as a communication client for communicating over the network. When the user device is engaging in a call, a microphone connected to the user device is intended to receive speech or other audio signals provided by the user intended for transmission to the other user(s) in the call. However, the microphone may pick up unwanted audio signals which are output from the speakers of the user device. The unwanted audio signals output from the user device may contribute to disturbance to the audio signal received at the microphone from the user for transmission in the call.
In order to improve the quality of the signal, such as for use in the call, it is desirable to suppress unwanted audio signals (the background noise and the unwanted audio signals output from the user device) that are received at the audio input means of the user device.
The use of stereo microphones and microphone arrays in which a plurality of microphones operate as a single device are becoming more common. These enable use of extracted spatial information in addition to what can be achieved in a single microphone. When using such devices one approach to suppress unwanted audio signals is to apply a beamformer. Beamforming is the process of trying to focus the signals received by the microphone array by applying signal processing to enhance sounds coming from one or more desired directions. For simplicity we will describe the case with only a single desired direction in the following, but the same method will apply when there are more directions of interest. The beamforming is achieved by first estimating the angle from which wanted signals are received at the microphone, so-called Direction of Arrival (“DOA”) information. Adaptive beamformers use the DOA information to filter the signals from the microphones in an array to form a beam that has a high gain in the direction from which wanted signals are received at the microphone array and a low gain in any other direction.
While the beamformer will attempt to suppress the unwanted audio signals coming from unwanted directions, the number of microphones as well as the shape and the size of the microphone array will limit the effect of the beamformer, and as a result the unwanted audio signals suppressed, but remain audible.
For subsequent single channel processing, the output of the beamformer is commonly supplied to single channel noise reduction stage as an input signal. Various methods of implementing single channel noise reduction have previously been proposed. A large majority of the single channel noise reduction methods in use are variants of spectral subtraction methods.
The spectral subtraction method attempts to separate noise from a speech plus noise signal. Spectral subtraction involves computing the power spectrum of a speech-plus-noise signal and obtaining an estimate of the noise spectrum. The power spectrum of the speech-plus-noise signal is compared with the estimated noise spectrum. The noise reduction can for example be implemented by subtracting the magnitude of the noise spectrum from the magnitude of the speech plus noise spectrum. If the speech-plus-noise signal has a high Signal-plus-Noise to Noise Ratio (SNNR) only very little noise reduction is applied. However if the speech-plus-noise signal has a low SNNR the noise reduction will significantly reduce the noise energy.