Spoken language understanding systems have been deployed in numerous applications which require some sort of interaction between humans and machines. Most of the time, the interaction is controlled by a Voice User Interface (VUI) where the system asks questions of the users and then attempts to identify the intended meaning from their answers (expressed in natural language) and take actions in response to these extracted meanings. One important class of VUI applications employs Natural Language Understanding (NLU) technology to extract the semantics content of the user queries using statistical methods. We will call such applications statistical semantic systems. One important class of statistical semantic systems known as “call routing” are built to semantically classify a telephone query from a customer to route it to the appropriate set of service agents based on a brief spoken description of the customer's reason for the call. Call routing systems reduce queue time and call duration, thereby saving money and improving customer satisfaction by promptly connecting the customer to the right service representative in large call centers.
Before it can be used, statistical semantic systems first must be trained. Training requires the transcription and semantic annotation of many sample user inputs. For call routing, these user inputs are answers to an open prompt (such as: How may I help you?) at a call center main number. The human annotation of each user input with a semantic tag or meaning is referred to as tagging. In the call routing context, this semantic meaning is simply a call destination (corresponding either to an operator pool or another application), so, in this context, the terms “semantic meaning” and “call destination” can be used interchangeably. The set of transcribed and tagged requests is referred to as the training corpus.
Another example of a statistical semantic system that is not a call routing application would be a voice driven cell phone help application. Examples of annotated meaning could be [functionality: contacts] [question: How to add a contact], [functionality: contacts] [question: How to call a contact], etc. Some examples of user queries could be How do I call one of my contacts quickly? How do I add my friend info to my list?
FIG. 1 shows a typical statistical semantic system according to the prior art which has a Statistical Language Model (SLM) 101 and a Statistical Semantic Engine (SSE) 102 that are trained on the data in the training corpus. At runtime the SLM 101 receives a speech input from the user and determines the most likely recognized text. The SSE 102 classifies the recognized text with the most probable semantic tags. The SSE 102 also adds for the top returned meanings, a confidence score reflecting the likelihood that its classification is actually correct. The classification of the SSE 102 is used to guide a set of Dialog Prompts 103 that guide the user from an initial open prompt (e.g., How may I help you?) to a state where some actions can be taken, referred to as the final action of the understanding dialog. In call routing, this will be guiding the user from to an initial open prompt to a final destination. For the cell phone help example, it would be guiding the user from an initial open prompt to an unambiguous question for which we can play the instructions.
In current deployed applications, the tagged corpus is used just for training the SLM 101 and SSM 102 but the Dialog Prompts 103 still have to built by hand. This includes crafting confirmation prompts for each of the possible semantic meaning and designing the back-off dialog and the disambiguation dialogs. Grammars also need to be written for all the prompts.
A confirmation prompt is used when the confidence score is above a confirmation threshold and below an acceptance threshold. A confirmation prompt is needed for each semantic meaning.
A back-off dialog is used when the confidence score is below the confirmation threshold or when the confirmation fails. It is based on presenting the customer with a few choices in a hierarchical menu; for example, in the call routing context: Are you calling about e-mail, internet browser, connection problems, etc. Then if the user answers: “e-mail,” the next prompt could be: Is this for a password problem or reset, sending or receiving mail problem, etc. For our cell phone help example, the back-off, could be: Is your question related to messaging, ring tones, text messaging, etc.
Disambiguation dialogs are needed for each disambiguation semantic meaning, which is a semantic meaning that does not have enough information to convey the final action and represents a concept regrouping of multiple meanings needed for the taken the final action. Conversely the meanings, for which one can take the final action will be call final meanings. For example, in the call routing context, when a user says “voice-mail” (i.e. destination VoiceMail then most likely the application needs to ask an additional question for getting to a final destination (that might be one the following VoiceMailHelp, VoiceMailHelpPassword, VoiceMailCancel, VoiceMailAdd). For our cell phone help example, when a user says “contact list” (with meaning [functionality: contacts]), then most likely the application needs to ask an additional question for getting to a final meanings (that might be one the following [functionality: contacts] [question: How to add a contact], [functionality: contacts] [question: How to call a contact], [functionality: contacts] [question: How to display], etc.).
But it is not particularly easy to create the prompts and dialogs. It takes significant time and expense for a VUI expert and a speech expert to craft the confirmation, back-off and disambiguation prompts and design grammars to cover possible user inputs. The prompts and grammars are, in general, application specific and cannot be easily reused in another application. Moreover, the prompts are not very precise. Since many different customer requests are pooled together in the same semantic meanings, a confirmation prompt could sound strange and elicit a false confirmation. For example, the user says: I want a new channel. And the confirmation prompt is: I think I understood . . . You'd like to make some kind of change to your service or your account . . . Is that right?
Moreover, until now, the overall number of statistical semantic system application and more precisely call routing applications brought to market has not been high. As a result, each individual application has been a highly customized hand-crafted product much like the first automobiles were a hundred years ago. In marketing terms, the time to market (TTM) and total cost of ownership (TCO) are simply too high. That is, the existing processes for developing those applications are too expensive and therefore exclude many potential customers who need a less expensive product that is more standardized and “off-the-shelf.” In the case of call routing applications, it is the application design—writing dialog prompts and grammars—which require the bulk of the time and costs in developing a new product.
However, many of the companies that provide statistical semantic system or call routing applications have relatively few products deployed, and therefore are not at a stage in their business where significant cost-time reductions are needed in developing their call routing products. Moreover, in the case of relatively large companies that provide those products, their internal corporate structure is often highly complicated and their may be little direct communication between Professional Services personnel and those in Research and Development who may therefore be unaware that there are any problems of interest in this area.