Microphone arrays have long been used as a means of obtaining high quality sound capture. In general, the source signal is captured by multiple microphones and jointly processed to generate an enhanced output signal. For example, one or more microphones may be amplified while others are attenuated, resulting in a highly directional signal.
Current microphone array processing pipelines comprise two main stages, namely a linear beamformer that spatially filters the sound field, suppressing noise that comes from unwanted directions and a post-filter that performs additional noise reduction on the beamformer output signal. The output of the linear beamformer stage has some degree of noise reduction and generally improves perceptual quality. The output of the post-filter stage typically has much better noise reduction, but introduces artifacts into the output signal, which degrades the perceptual quality. As a result, in scenarios like videoconferencing and VoIP, the users/system designers are stuck with a choice of either minimal distortions but not much noise reduction or more noise reduction but significant distortions and artifacts.