A goal of automatic speech recognition (ASR) technology may be to map a particular audio utterance to an accurate textual representation of that utterance. For instance, ASR performed on the utterance “cat and dog” would ideally be mapped to the text string “cat and dog,” rather than the nonsensical text string “skate and hog,” or the sensible but inaccurate text string “Kate and Doug.” ASR systems can be trained based on a large corpus of utterance-to-text-string mappings. However, ASR system performance may vary based on the characteristics of this corpus.