The present invention relates to speech processing. In particular, the present invention relates to a method for speech enhancement and speech recognition in noisy environments.
Automatic speech recognition is an important technology that is used in mobile devices and other devices. In general, automatic speech recognition attempts to provide accurate transcriptions of what a person has said.
In speech recognition, it is common to condition the speech signal to remove noise and portions of the speech signal that are not helpful in decoding the speech into text. For example, it is common to apply a frequency-based transform to the speech signal to reduce certain frequencies in the signal that do not aid in decoding the speech signal.
Speech systems also attempt to enhance the speech signal by removing noise before performing speech recognition. Under some systems, this is done in the time domain by applying a noise filter to the speech signal. In other systems, this enhancement is performed using a two-stage process in which the pitch of the speech is first tracked using a pitch tracker and then the pitch is used to separate the speech signal from the noise. For various reasons, such two-stage processing is undesirable.
An alternate system for removing noise from a speech signal attempted to identify a clean speech signal in a noisy signal using a probabilistic framework that provided a Minimum Mean Square Error (MMSE) estimate of the clean signal given a noisy signal.
There clearly is a need for improving capabilities in the art of speech recognition when it comes to recognition accuracy in high noise environments.
Additionally, there is clearly a need for improving the ability of speech enhancement systems to separate a target speaker from background noise and background speakers.
U.S. Pat. No. 955,483 (Senior et al.) discloses a system for performing speech recognition using Neural Networks.
U.S. Pat. No. 7,664,643 (Gopinath et al.) discloses a system for doing speech separation and speech enhancement in a probabilistic framework.
U.S. Pat. No. 7,596,494 (Kristjansson et al.) discloses a system for speech enhancement using a high resolution signal representation.
U.S. Pat. No. 6,985,858 (Frey et al.) discloses a method for doing robust speech recognition.
“Spectral intersections for non-stationary signal separation” (Trausti Kristjansson and Thad Hughes) discloses a method for inference for speech enhancement that uses spectral intersections.