Spoken dialog systems interact with users via spoken language to help them achieve a goal. For input, spoken dialog systems rely on automatic speech recognition (ASR) to convert speech to words, and spoken language understanding (SLU) translate the words to determine the local meaning, which is the meaning contained in the user's speech in a turn. However, ASR and SLU are prone to errors. If unchecked, errors substantially erode the user experience and can ultimately render a dialog system useless. Even seemingly low error rates are problematic in dialog systems. For example, if recognition errors occur in only 5% of turns and the average dialogs is 20 turns, the majority of dialogs (65%) contain at least one error. Acknowledging the possibility of errors, the ASR and SLU output alternatives on a list called the N-best list, in addition to their best guess.
Dialog state tracking overcomes a large fraction of recognition errors in spoken dialog systems; however, aspects of dialog state tracking are still problematic. The two main approaches used in dialog state tracking are generative models and discriminative models. Generative models, such as n-gram models, Naïve Bayes classifiers, and hidden Markov models, rely on the joint probability of concepts and semantic constituents of an utterance to determine meaning. In contrast, discriminative models learn a classification function based on conditional probabilities of concepts given the semantic constituents of an utterance.
In general, the technical problem addressed is improving accuracy in statistical dialog state tracking. It is with respect to these and other considerations that the present invention has been made. Although relatively specific problems have been discussed, it should be understood that the aspects disclosed herein should not be limited to solving the specific problems identified in the background.