A goal of automatic speech recognition (ASR) technology is to map a particular utterance to an accurate textual representation of that utterance. For instance, ASR performed on the utterance “my dog has fleas” would ideally be mapped to the text string “my dog has fleas,” rather than the nonsensical text string “my dog has freeze,” or the reasonably sensible but inaccurate text string “my bog has trees.” However, ASR may be challenging due to different individuals having different speech patterns (e.g., different accents, phrasings, and word choice). Additionally, any background noise that is recorded along with an utterance can make it more difficult to discern between the utterance and the background noise.