Automated Speech Recognition (“ASR”) engines convert speech to text. In doing so, ASR engines typically rely on acoustic models that map the sounds of each utterance to candidate words or phrases, and language models that specify which of these candidate words or phrases are more likely to be correct based on historical uses of the words or phrases.
To improve recognition accuracy, ASR engines may use different acoustic models and language models to recognize utterances that are associated with different contexts. For example, one language model may be used to recognize utterances that are spoken when a user is entering a text message, and a different language model may be used when the user is entering search terms.
In general, each language model is typically built using a corpus of words or phrases that have been collected by the ASR engine or another system over time. For instance, context-specific language models may be estimated from logs of previous speech recognition results or logs of previous text input from multiple users in similar contexts. The words or phrases in a particular corpus may include words or phrases that have been explicitly provided by the user, or candidate transcriptions that have been recognized by an ASR engine.
If a language model that is developed for a given language and a particular context is used to recognize utterances that are spoken in a different context, an ASR engine may generate inaccurate recognition results. Accordingly, to increase recognition accuracy, an ASR engine should use a language model that is appropriate to both the language of the utterances as well as to the context in which the utterances were spoken. For certain infrequently used languages, or for infrequently occurring contexts, an ASR engine may not have access to an appropriate language model.