ASR technologies enable microphone-equipped computing devices to interpret speech and thereby provide an alternative to conventional human-to-computer input devices such as keyboards or keypads. A typical ASR system includes several basic elements. A microphone and an acoustic interface receive an utterance of a word from a user, and digitize the utterance into acoustic data. An acoustic pre-processor parses the acoustic data into information-bearing acoustic features. A decoder uses acoustic models to decode the acoustic features into utterance hypotheses. The decoder generates a confidence value for each hypothesis to reflect the degree to which each hypothesis phonetically matches a subword of each utterance, and to select a best hypothesis for each subword. Using language models, the decoder concatenates the subwords into an output word corresponding to the user-uttered word.
One problem encountered with ASR is that audio signals contain not only speech utterances of a user, but also contain undesirable distortions. Audio signal distortions are many and various, including ambient noise like road noise, transient noise like windshield wiper operation, and electronics noise like clips, mutes, bit errors, codec errors, data packet errors, and channel distortion. Receipt of such distortions by an ASR system may lead to ASR rejection errors where speech cannot be recognized, or errors of insertion or substitution of acoustic data that leads to misrecognition of speech.
Attempts to minimize such distortions include use of subjective listening techniques during development of an ASR system. With such techniques, several expert individuals listen to typical audio signals that will be received by the ASR system in actual use in the field, and then score the signals based on the absence, presence, or amount of the various distortions they hear in the signals. The scores are averaged and if the average score for any given distortion is too low, then some type of corrective action is taken in the design of the ASR system. But such techniques may be too costly or complex, and are conducted off-line and before actual use of the ASR system in the field.
Other attempts to deal with distortion include real-time noise removal or noise reduction techniques. Such techniques involve evaluating an initial portion of a signal for noise and attempting to electronically subtract such noise just before or during reception of speech present in the signal. But such techniques may be too ineffective or unreliable.