As in known in the art, the demand for speech interfaces in the home and other environments is increasing. In these applications, the speaker cannot be assumed to be in the direct vicinity of the microphone(s). Therefore, the captured speech signal may be smeared by reverberation and other kinds of interferences, which can lead to a degradation of the automated speech recognition (ASR) accuracy.
Conventional beamformer-postfilter systems rely on the assumption that the speaker position is known, which may not be the case. For example, a sector with a twenty-five degree width can be created inside which the ASR performance is enhanced. Outside this “sweet spot,” signals are suppressed so that if a speaker moves outside of the twenty-five degree sector, speech from the speaker may be suppressed.
In known systems, acoustic speaker localization can be used to steer the beam to the actual speaker position. This may not work robustly for scenarios in which reverberation and interference are present. Another known approach is to enable the beamformer to adapt to some extent to the true speaker position. However, this approach may be suboptimal. Speaker localization using a camera may not be a realistic option as a camera may not be available.