Speech recognition systems are specialized computers that are configured to process and recognize human speech and may also take action or carry out further processes. Developments in speech recognition technologies support “natural language” type interactions between automated systems and users, allowing a user to speak naturally, e.g., when providing commands to a computer application.
An important component of a speech recognition system is the language model. The language model indicates the set of speech inputs that the system can recognize, as well as data that is used to guide the mapping from speech input to words. For example, one particular language model might indicate that the phrase “Send a text to John and Fred” is a valid speech input, but the phrase “Text send and Fred” is not.
It is desirable for the language model to have a number of properties. For example, the language model should support the recognition of data specific to a context, such as specific names. At the same time, the language model should be efficient, such as requiring little data to build the model, supporting rapid runtime incorporation of context specific information, and limit resources required such as CPU and memory. Unfortunately, achieving some of the desired properties tends to make achieving others more difficult. For example, supporting data specific to a context tends to require acquisition of more data and retraining to build accurate models for that context.