1. Field of the Invention
The present invention relates to spoken dialog systems and more specifically to a system and method of disambiguating multiple intents in a user utterance.
2. Introduction
Conversational natural language interactive voice response (IVR) systems encourage callers to speak naturally and express their intent to a speech application without any constraints on how they can speak or what they can say. For example, the IVR systems indicate to the user that it is their turn to speak by saying “How may I help you?” This is an open-ended question in which the user can then simply ask a question. Within that question the user may indicate multiple questions, such as desiring both a cost and an availability of a product.
One problem that arises from caller's speech when it contains multiple intents. The problem relates to how the IVR system decides what intent to process first or which intent the caller actually wants processed. An additional problem relates to the current approach in resolving such ambiguity. If the IVR system is looking for specific intents of the user, such as defining one “intent” as the desire to know the price of something, the IVR system may categorize an input as having a confidence score associated with that intent. An example of this may be that the system assigns a 0.6 confidence score to an utterance that it believes is a price request.
The current approach uses just the confidence score whereby the intent classified by the spoken language understanding (SLU) model with a higher confidence is selected for processing. However, empirical evidence shows that using confidence scores often leads to an incorrect choice because of other factors affecting the data that is used to train the language understanding module. For example, the unequal distribution of utterances representing the various caller intents can sway the confidence associated with each intent. When the natural language IVR makes an incorrect choice, three negative consequences arise: (a) a caller may be sent to the wrong termination point leading to caller frustration; (b) when such termination is a separate IVR there is loss of revenue because not only will the caller not complete their call, but the network minutes used increases affecting the average handle time for the call; and (c) callers sent to the incorrect termination point are likely to drop out and call back leading to increased costs.
What is needed in the art is an improved manner of managing the spoken dialog where a user includes multiple intents in a user utterance.