Automatic speech recognition (ASR) uses language models for determining plausible word sequences for a given language or application domain. In some instances, these language models may be created or customized for target domains by using language model (LM) interpolation. In LM interpolation, a number of component LMs, each of which may be designed to reflect particular source or corpora, are combined together using weights optimized on a random sample drawn from the target domain. Therefore determining these optimized interpolation weights is a primary goal in ASR techniques that utilize LM interpolation.
But determining optimized interpolation weights poses particular challenges. For instance, where the component LMs are class-based, there is no common denominator or single (word-level) representation of the training corpus used for optimizing the interpolation weights. This causes the component models to compete with each other. Additional challenges are introduced in scenarios employing context-specific interpolation, which produces results that are superior to interpolation that does not account for context. Attempts to mitigate these challenges yield poor performance such as low resolution, and are inefficient or contextually unaware. Furthermore, it is not possible under existing approaches to achieve a combination of class-based LMs with context-specific interpolation.