Natural language processing systems typically include one or more models that they use to process input. For example, automatic speech recognition systems typically include an acoustic model and a language model. The acoustic model is used to generate hypotheses regarding which words or subword units (e.g., phonemes) correspond to an utterance based on the acoustic features of the utterance. The language model is used to determine which of the hypotheses generated using the acoustic model is the most likely transcription of the utterance based on lexical features of the language in which the utterance is spoken. As another example, natural language understanding systems typically include models for named entity recognition, intent classification, and the like. The natural language understanding models can be used to determine an actionable intent from the words that a user speaks or writes.
Acoustic models, language models, natural language understanding models, and other models used in spoken language understanding (together referred to as spoken language understanding models), may be specialized or customized to varying degrees. For example, an automatic speech recognition system may have a general or base model that is not customized in any particular manner, and any number of additional models for particular genders, age ranges, regional accents, speakers, or any combination thereof. Some systems may have models for specific subject matter (e.g., medical terminology) or even specific users.