The present disclosure relates to call routers, speech recognition, text classification, and training of call routing systems.
Spoken language understanding systems have been deployed in numerous applications that involve interaction between humans and machines. Such systems typically operate by processing spoken utterances from a human to identify intended meaning of questions or answers uttered by the human. By correctly identifying an intended meaning of a spoken utterance, the system can execute various actions in response to that utterance, for use with various applications. For example, such actions can include performing mechanical operations, operating a computer, controlling a car audio/navigation system, or routing a telephone call to an appropriate area or agent.
One particular class of applications employs Natural Language Understanding (NLU) technology as a type of semantic classification known as “call routing.” Call routing applications involve semantically classifying a telephone query or statement from a caller to route the telephone call to an appropriate agent (real or automated) or to a location within the call routing system. Such routing, for example, can be based on a brief spoken description of the caller's reason for the telephone call. Call routing systems reduce queue time and call duration, thereby saving money and improving customer satisfaction by promptly connecting a given caller to a correct service representative, such as in large call centers.
Call routing applications classify spoken inputs or utterances into a small set of categories for a particular application. For example, the spoken inputs, “I have a problem with my bill,” “Check my balance,” and “Did you get my payment?” might all be mapped to a “Billing” category, or each might be mapped to one of several subcategories within a broader billing category. Since people express spoken requests and queries in many different ways, call routers are typically implemented as a statistical classifier that is trained on a labeled or tagged set of spoken requests and their corresponding classifications.
Determining a semantic tag or classification for a human utterance in a call routing system typically involves converting input speech from a speaker into a text string by an automated speech recognition (ASR) module or system (also known as a speech recognizer). This text string generated by the speech recognizer is output into an NLU semantic classifier known as a statistical router. The statistical router models the task of natural language understanding as a statistical classification problem in which the text string corresponding to the human utterance is assigned to one or more of a set of predefined user intents, referred to as “call routes,” as part of a route ordering/reordering process. The route ordering process can also receive a confidence level of assigned routes. The call router can then execute a routing decision. The routing decision can be based on thresholds corresponding to confidence levels of assigned routes. Various specific classifiers can have high levels of classification accuracy.