Speech enhancement solutions are desirable for use in audio systems to enable robust automatic speech command recognition and improved communication in noisy environments. Conventional enhancement methods can be divided into two categories depending on whether they employ a single or multiple channel recording. The first category is based on a continuous estimation of the signal-to-noise ratio, generally in the discrete time-spectral domain, and can be quite effective if the noise does not exhibit a high amount of energy variation (i.e., non-stationarity). The second category, known as beam forming, estimates a set of spatial filters aimed at enhancement of a signal coming from a predefined spatial direction. The effectiveness of beam forming methods depend on the amount of energy propagating over the steering geometrical direction and whether it is proportional on the number of available channels.
However, when the number of channels is limited and the amount of reverberation is not negligible, the conventional solutions described above typically do not provide satisfactory performance. Particularly in the case of far-field applications, i.e., when the speaker is at large distance from the microphones (e.g., more than 1 meter), for example, the amount of energy propagating over the direct path may be small compared to the reverberation.