Natural language understanding (NLU) systems have been deployed in numerous applications which require some sort of interaction between humans and machines. Most of the time, the interaction is controlled by the machine which asks questions of the users and then attempts to identify the intended meaning from their answers (expressed in natural language), and then takes action in response to these extracted meanings.
One important use NLU technology is in automated dialog systems that manage human-machine interactions. FIG. 1 shows a typical NLU dialog system based on receiving input speech from a telephone caller that is translated into a text string by an Automated Speech Recognition (ASR) Module 101. The ASR text is output into an NLU semantic classification component known as a Statistical Router 102. The Statistical Router 102 models the NLU task as a statistical classification problem in which the ASR text corresponding to an utterance is assigned to one or more of a set of predefined user intents. Various specific classifiers have been compared in the literature with similar performance (1-2% differences in classification accuracy), including, for example, Boosting , Maximum Entropy (ME), and Support Vector Machines (SVM). For example, Statistical Router 102 may use binary unigram features and a standard back-propagation neural network as a classifier.
The Statistical Router 102 typically has an unacceptably high error rate (10-30% classification error rates are commonly reported in deployed applications), and thus a rejection mechanism is implemented to only retain those route hypotheses which are most likely to be correct. The rejection decision should not be based only on the confidence in the classification of the Statistical Classifier 102 because the ASR Module 101 can also make recognition errors which should be taken into account. Therefore, another separate classifier—Confidence Engine (CE) 103—is used to produce confidence scores based on both acoustic and NLU features to determine the highest ranked N hypotheses (typically 3-5) output from the Statistical Classifier 102. An Intent Reordering Component 104 then reorders the classification hypotheses according to their overall confidence as determined by the CE 103. The best scoring classification hypothesis is sent to a Threshold Decision Module 105 which accepts the hypothesis if its confidence score is above an accept threshold. The value of the accept threshold is chosen so that the system satisfies one or more operating constraints such as an upper bound on the False Accept Rate (FAR) (typically 1-5%).
NLU dialog applications also produce dialog data during their operation that is collected in a Dialog Information Database 106. That dialog data is later analyzed to improve the operating quality of the NLU application and others, and to help improve the development of new future NLU products. The dialog data also helps to identify and prioritize problems that need to be addressed, and to improve the statistical models that are used by the system. But these NLU dialog applications generate an enormous amount of data, and it is simply not reasonable to inspect every piece of collected information or even get an adequate coverage from random sampling of the data.