Speech recognition systems require accurate pronunciations of words. The language models used by general speech recognition systems to convert spoken audio into text are trained using standard or common pronunciations. However, the general language models may not have appropriate pronunciations for some words. This may be because the existing pronunciations in the language model for a particular word are flawed or the pronunciation for a new or trending word does not exist in the language model. In addition, speaker-specific factors (e.g., a speaker's accent or use of non-standard pronunciations) directly impact the ability of speech recognition systems to properly recognize spoken audio for any given speaker. Further, some words may have unusual pronunciations that have particular relevance, particular in the case of personal names.
In order to improve recognition accuracy, these problematic words must be identified and appropriate pronunciations must be provided. For pronunciation generation, the traditional supervised approach of hiring phoneticians to create them is still standard in the industry.
It is with respect to these and other considerations that the present invention has been made. Although relatively specific problems have been discussed, it should be understood that the embodiments disclosed herein should not be limited to solving the specific problems identified in the background.