The availability and usage of speech enabled devices is becoming increasingly widespread. Accurate speech recognition and language understanding is important for a satisfactory user experience. Speech signals that are captured in the far-field of a microphone, however, are often not of sufficiently high quality, due to noise and reverberation, to meet the requirements of automatic speech recognition systems and other speech processing applications, which must provide a relatively low word error rate for acceptable performance. Existing far-field speech pre-processing techniques attempt to boost the quality of the received signals but suffer from a number of non-trivial issues including latency, complexity, and the need for a microphone array that includes a relatively large number of microphones. Additionally, many existing techniques rely on voice activity detection which generally does not perform well at low signal-to-noise ratios.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent in light of this disclosure.