1. Field of the Invention
The present invention relates to language processing and, more particularly, to natural language classification.
2. Description of the Related Art
Oftentimes there is a need to classify a user input specified in natural language into one or more classes or actions to interpret their meaning. Such classification of natural language input is the basis for many “understanding” applications. An example of such an application is a natural language call routing application where calls are transferred to an appropriate customer service representative or self-service application based on the natural language user input. For example, in response to receiving the user input “I have a problem with my printer” during a call session, a call routing application can route the call to a printer specialist.
FIG. 1 is a block diagram that depicts a conventional natural language classification system 100. The classification system 100 may include a classifier 105 that receives a natural language user input (hereinafter “user input”) 110 and classifies the user input 110 as a particular type of request to generate a natural language classification result 115. The user input 110 can be provided as spoken words, typed text or in any other form.
More particularly, one or more statistical models 120 generated by a statistical model trainer 125 are used by the statistical classifier 105 at runtime to classify text or phrases contained in the user input 110 into likely classes of request types. These models 120 typically are generated from a corpus of domain specific phrases 130, 135, 140 of likely user requests, referred to as “training data” 145, which are separated into the classes 150, 155, 160 based on the actions implied in the phrases 130-140. When a user input 110 is classified into at least one of the known classes 150-160, this action is referred to as “closed set classification”. Conversely, “open set classification” occurs when the user input 110 not only can be classified into at least one of the known classes 150-160, but also can be identified as not matching any known classes 150-160.
Speech recognition systems frequently generate incoherent and incomplete phrases owing to background noise, callers getting cut-off, or callers speaking outside the domain of the speech recognition system. In such situations, the best recourse is to reject the input and seek clarification from the caller. Unfortunately, classifiers of the prior art are not able to adequately identify when natural language user input does not match any known classes in a statistical model. Thus, instead of seeking clarification from the caller, such classifiers often incorrectly classify natural language user inputs, which results in calls being incorrectly routed. Accordingly, a classifier is needed that more accurately identifies when natural language user inputs do not match any known classes.