Speech recognition systems have been used to convert spoken words into text. In medium and high noise environments, however, the accuracy of automatic speech recognition systems tends to degrade significantly. As a result, most speech recognition systems are used with audio captured in a noise-free environment.
Unlike speech recognition systems, a standard noise reduction strategy consists of strongly attenuating portions of the acoustic spectrum which are dominated by noise. Spectrum portions dominated by speech are preserved.
Strong attenuation of undesired spectrum portions is a valid strategy from the point of view of noise reduction and perceived output signal quality, it is not necessarily a good strategy for an automatic speech recognition system. In particular, the spectral regions strongly attenuated by noise suppression may have been necessary to extract features for speech recognition. As a result, the attenuation resulting from noise suppression corrupts the features of the speech signal more than the original noise signal. This corruption by the noise suppression of the speech signal, which is greater than the corruption caused by the added noise signal, causes the noise reduction algorithm to make automatic speech recognition results unusable.