A speech recognition service generally receives spoken input from a user, and transcribes the spoken words into text. To accomplish this, the speech recognition service may attempt to match the sounds of the spoken input with phonetic representations of textual words included in a particular vocabulary of words. The textual words may be used for many purposes, such as for input into a search system, for taking notes in an electronic document, or for drafting an electronic message. The accuracy of speech to text conversion is important to ensure a positive user experience. Generally, the more accurate the converted text is, the better the user experience.
Accuracy of speech recognition may be measured by an out-of-vocabulary (OoV) rate. The OoV rate indicates the rate at which a speech recognition service fails to correctly transcribe a spoken word because that word is not present in the vocabulary of the speech recognition service. While larger vocabularies tend to produce lower OoV rates than smaller vocabularies, larger vocabularies may also take up more resources, may result in a slower speech recognition service, and may increase the likelihood that one spoken word will be confused with a different word in the vocabulary, due to the increased pool of words to be searched when matching spoken input to text.