Automatic speech recognition (ASR) systems are configured to recognize spoken utterances set forth by users. With more particularity, a microphone generates an electrical signal responsive to capturing audio, wherein the audio includes the spoken utterance. The electrical signal is processed to filter noise from the audio and extract features that can be used to recognize the spoken utterance. While performance (e.g., speed and accuracy) of ASR systems has greatly improved over the last several years, conventional ASR systems continue to have difficulty when large vocabularies are considered, when the ASR systems have not been trained with suitable training data that is representative of particular accents or dialects, or when other suboptimal conditions exist. Moreover, ASR systems often have difficulty recognizing spoken utterances set forth in noisy environments, such as when the utterance is set forth in a crowded airport, in a moving automobile, etc.