The present disclosure relates to natural language understanding and text classification. The present disclosure also relates to the training of semantic classification applications that can be used with call routers and voice-activated or voice-controlled systems.
Spoken language understanding systems have been deployed in numerous applications that involve interaction between humans and machines. Such systems typically operate by processing spoken utterances from a human to identify intended meaning of questions or answers uttered by the human. By correctly identifying an intended meaning of a spoken utterance, the system can execute various actions in response to that utterance, for use with various applications. For example, such actions can include performing mechanical operations, operating a computer, controlling a car audio/navigation system, or routing a telephone call to an appropriate area or agent.
One particular class of applications employs Natural Language Understanding (NLU) technology as a type of semantic classification known as “call routing.” Call routing applications involve semantically classifying a telephone query or statement from a caller to route the telephone call to an appropriate agent or location within the call routing system. Such routing, for example, can be based on a brief spoken description of the caller's reason for the telephone call. Call routing systems reduce queue time and call duration, thereby saving money and improving customer satisfaction by promptly connecting a given caller to a correct service representative, such as in large call centers.
Call routing applications classify spoken inputs or utterances into a small set of categories for a particular application. For example, the spoken inputs, “I have a problem with my bill,” “Check my balance,” and “Did you get my payment?” might all be mapped to a “Billing” category, or each might be mapped to one of several subcategories within a broader billing category. Since people express spoken requests and queries in many different ways, call routers are typically implemented as a statistical classifier that is trained on a labeled set of spoken requests and their corresponding classifications.
Determining a semantic classification for a human utterance in a call routing system is typically a five-step process as illustrated the online recognition process 110 of FIG. 1. Input speech 107 from a speaker 106 is translated into a text string by an automated speech recognition (ASR) module 112. This text string generated by the ASR module 112 is output into an NLU semantic classification component known as a statistical router 114. The statistical router 114 models the task of natural language understanding as a statistical classification problem in which the text string corresponding to the human utterance is assigned to one or more of a set of predefined user intents, referred to as “call routes,” as part of a route ordering/reordering process (117). The route ordering process can also receive a confidence level of assigned routes using confidence engine 115. Then the recognition process can execute a routing decision (119). The routing decision can be based on thresholds corresponding to confidence levels of assigned routes. Various specific classifiers can have high levels of classification accuracy.
Creating a new call routing application is a training process. Typically, a new set of training utterances is initially developed based on specific needs of the new application, or needs of an entity requesting the new call routing application. FIG. 2 shows this process generally. A training corpus 201 contains examples of sample training utterances 202 which are labeled with associated router classification tags 203. A feature set in the training corpus 201 is selected (e.g., words in the sample training utterances 202) which together with a classification model 205 (e.g., neural network) can be used to build and train a call routing semantic classifier 204 for the application.
This training process is an expensive process because a large labeled training corpus 201 must be collected and developed for each new application to create an application with acceptable classification accuracy. Such collection and development is usually a manual process. This collection and development process (offline training process 120) involves human/manual transcription 122 or human-reviewed transcription of a set of recorded utterances 127, such as recorded queries from calls received at a particular call center. The human transcriber/reviewer then annotates recorded utterances with their semantic classes to create a set of semantic labels 125. That is, the human transcriber identifies what was said, and what class to assign. After offline training of the call routing classifier 204 on the training corpus 201, the call routing classifier 204 can be implemented in the application to process live, unlabeled, incoming utterances from real callers of an online call routing application.