Automatic speech recognition is a task by which a user speaks an utterance into a computerized speech recognition system, and the speech recognition system recognizes the speech contained in the utterance input by the user. As can be imagined, the utterance input by the speaker, which is typically captured by a microphone, can be corrupted by a variety of different types of noise. The noise in the signal representing the utterance can reduce the accuracy with which the computerized speech recognition system recognizes the speech. Therefore, some current systems attempt to reduce noise in the speech signal in order to improve the accuracy of the speech recognition function performed by the computerized speech recognition system.
Noise reduction techniques have also been employed in speech enhancement environments. In other words, where a human listener is listening to speech that was input by another user in the presence of noise, both noise reduction and speech enhancement can be employed to make it easier for the human listener to listen to the speech.
It is currently believed, by many, that the desirable signal domain to which noise reduction or speech enhancement should be applied is different based on whether the speech signal is to be used for human listening or automatic speech recognition. It is currently widely believed that the lower the distortion is between the enhanced speech and the clean speech in the domain closest to the back end of the system (in a human listening environment, the back end is the portion that allows human perception of the generated speech, and in a speech recognition system, the back end is the portion of the system that performs the machine recognition function), the better the performance will be.
Therefore, for subjective human listening, noise reduction is often applied in the spectral domain. For example, in that scenario, noise reduction can be provided using known techniques such as spectral subtraction, Weiner filtering, and Ephraim/Malah spectral amplitude minimum mean square error (MMSE) suppression. Subjective human listening experiments show that speech enhancement becomes more effective when it is applied to the logarithm spectral amplitude domain. This confirms an observation that the periphery auditory system of a human being performs the kind of compression that is similar to logarithmic scaling.
The discussion above is merely provided for general background information and is not intended to be used as an aid in determining the scope of the claimed subject matter.