Speech recognition systems have become widespread with the proliferation of mobile devices having advanced audio and video recording capabilities. Speech recognition techniques have improved significantly in recent years as a result. Advanced speech recognition systems can now achieve high accuracy in clean environments. Even advanced speech recognition systems, however, suffer from serious performance degradation in noisy environments. Such noisy environments often include a variety of speakers and background noises. Mobile devices and other consumer devices are often used in these environments. Separating target audio signals, such as speech from a particular speaker, from noise thus remains an issue for speech recognition systems that are typically used in difficult acoustical environments.
Many algorithms have been developed to address these problems and can successfully reduce the impact of stationary noise. Nevertheless, improvement in non-stationary noise remains elusive. In recent years, researchers have explored an approach to separating target audio signals from noise in multi-microphone systems based on an analysis of differences in arrival time at different microphones. Such research has involved attempts to mimic the human binaural system, which is remarkable in its ability to separate speech from interfering sources. Models and algorithms have been developed using interaural time differences (ITDs), interaural intensity difference (IIDs), interaural phase differences (IPDs), and other cues. Existing source-separation algorithms and models, however, are still lacking in non-stationary noise reduction.