Automatic speech recognition (ASR) uses language models for determining plausible word sequences for a given language or application domain. The prevailing approaches to ASR typically use discrete words (n-grams) as the basic units of language modeling. But these approaches are limited to relatively short memory spans and an inability to efficiently generalize from limited training data.
Attempts to mitigate these weaknesses include casting named entities as a class, in a class-based language model (LM), where the classes may be automatically inferred or manually defined, and in some cases, employing common cohesive word strings for improving post-recognition tasks, such as machine translation, information extraction, and language understanding. However, such methods are inefficient, contextually unaware, or otherwise do not preserve the simplicity and scalability of n-gram language models. Moreover, approaches using larger-span units or multi-word units in addition to discrete words promise improvement, especially for applications that are domain-constrained.