Dialog state trackers are used by computers to predict what users have in mind based on conversational inputs, or utterances, that the users provide via speech, text, or other means. Dialog state trackers are useful for spoken dialog systems, such as Apple Siri™, Google Now™, Microsoft Cortana™, Amazon Echo™, Adobe DataTone™, and Adobe PixelTone™, as they provide a convenient and natural way for users to interact with computers. Computers can internally represent what a user has in mind as a set of slot-value pairs, referred to as a dialog state, and use the dialog state to determine which actions to take. A slot represents a general category (e.g., FOOD) and a value represents more specifically what the dialog participant has in mind (e.g., “Apples”).
Computers that use dialog state tracking rely upon the accurate prediction of dialog states to determine appropriate responses to dialog. However, conventional approaches have difficulty accounting for the nuances of natural language. For example, conventional systems have difficulty identifying the presence of slot-value pairs in utterances when users do not provide the exact words or phrases that the systems use to recognize the slot-value pairs. Conventional systems also have difficulty choosing the correct slot-value pairs when the words or phrases used to recognize different slot-value pairs are similar, which is likely when there are many potential slot-value pairs to choose from.