A spoken message can be conveyed to a recipient as either audio or text. For example, some mobile devices are able to either play audio of a voicemail or display a transcription of its spoken words. Automatic speech recognition (ASR) engines are used to generate text from spoken words. An ASR engine can evaluate portions of audio against words and select sequences of words that are likely to represent spoken words from the audio.
A number of variables affect the accuracy with which an ASR engine recognizes spoken words. Prominent among these factors is whether a word exists in the ASR engine's vocabulary. If it does not, the ASR engine will not recognize the word when it is spoken in an audio recording. Additionally, if a word is not frequently used, an ASR engine might misrecognize the word, favoring one that is statistically more likely to be spoken. These factors can reduce the accuracy with which an ASR engine recognizes many words. Among the words that are commonly misrecognized are proper names, such as those for people, streets, and restaurants, and other words that have a special relevance in personal messages like voicemails.
The need exists for a system that overcomes the above problems, as well as one that provides additional benefits. Overall, the examples herein of some prior or related systems and their associated limitations are intended to be illustrative and not exclusive. Other limitations of existing or prior systems will become apparent to those of skill in the art upon reading the following Detailed Description.