ASR technologies enable microphone-equipped computing devices to interpret speech and thereby provide an alternative to conventional human-to-computer input devices such as keyboards or keypads. A typical ASR system includes several basic elements. A microphone and an acoustic interface receive an utterance of a word from a user, and digitize the utterance into acoustic data. An acoustic pre-processor parses the acoustic data into information-bearing acoustic features. A decoder uses acoustic models to decode the acoustic features into utterance hypotheses. The decoder generates a confidence value for each hypothesis to reflect the degree to which each hypothesis phonetically matches a subword of each utterance, and to select a best hypothesis for each subword. Using language models, the decoder concatenates the subwords into an output word corresponding to the user-uttered word.
One problem encountered with ASR-enabled vehicles is that an ASR system may confuse vehicle road noise for speech. Receipt of such road noise by the ASR system may lead to insertion of undesirable acoustic data that leads to misrecognition of speech. Some ASR systems attempt to prevent such road noise insertions using front-end signal processing or recognition engine algorithms. But such techniques involve complex and resource intensive processes, which may not actually improve speech recognition accuracy in many cases.