Extracting sound sources in noisy and reverberant conditions is commonly found in modern communication systems. In the last four decades, a large variety of spatial filtering techniques have been proposed to accomplish this task. Existing spatial filters are optimal when the observed signals are conform to the signal model and when the information that may be used to compute the filters is accurate. In practice, however, the signal model is often violated and estimating the information that may be used is a major challenge.
Existing spatial filters can be broadly classified into linear spatial filters (see, e.g, [1, 2, 3, 4]) and parametric spatial filters (see, e.g., [5, 6, 7, 8]). In general, linear spatial filters may use an estimate of the one or more propagation vectors or the second-order statistics (SOS) of the desired one or more sources plus the SOS of the interference. Some spatial filters are designed to extract a single source signal, either reverberant or dereverberated, (see, e.g., [9, 10, 11, 12, 13, 14, 15, 16]), while others have been designed to extract the sum of two or more reverberant source signals (see, e.g., [17, 18]). The aforementioned methods involve prior knowledge of the direction of the desired one or more sources or a period in which only the desired sources are active, either separately or simultaneously.
A drawback of these methods is the inability to adapt sufficiently quickly to new situations, for example, source movements or competing speakers that become active when the desired source is active. Parametric spatial filters are often based on a relatively simple signal model, e.g., the received signal in the time-frequency domain consists of a single plane wave plus diffuse sound, and are computed based on instantaneous estimates of the model parameters. Advantages of parametric spatial filters are a highly flexible directional response, a comparatively strong suppression of diffuse sound and interferers, and the ability to quickly adapt to new situations. However, as shown in [19], the underlying single plane wave signal model can easily be violated in practice which strongly degrades the performance of the parametric spatial filters. It should be noted that state-of-the-art parametric spatial filters use all available microphone signals to estimate the model parameters, while only a single microphone signal and a real-valued gain is used to compute the final output signal. An extension to combine the multiple available microphone signals to find an enhanced output signal is not straight-forward.
It would therefore be highly appreciated if improved concepts for obtaining a desired spatial response to the sound sources would be provided.