1. Technical Field
This invention relates to the field of natural language understanding, and more particularly, to developing systems for building natural language models.
2. Description of the Related Art
The ability to classify natural language input forms the basis for many “understanding” applications. An example of this would be a natural language call routing application where calls are transferred to the appropriate customer representative or system based on the user request. In many Natural Language Understanding (NLU) applications, there is a need to classify the user request, specified in natural language, into one or more of several classes or actions. Such input can be provided as spoken words or typed text. For example, in an interactive voice response (IVR) application such as a call routing application, a user can submit spoken input to be directed to various destinations such as a customer service representative or a service offering. The IVR can select a destination which depends on the meaning or interpretation of the user's request. Notably, the IVR should sufficiently understand the user's request to correctly route the call. For example, the user request “I have a problem with my printer” should be routed to a printer specialist, whereas “I forgot my password” should be routed to a password administrator. Another example is a natural language dialog processing system that interprets user requests for information or transactions. In such systems, the classification serves to identify the specific action that is being requested by the user. For example, in a mutual fund trading system a request like “I would like to buy 50 shares of the growth fund” would be processed as a fund purchase request whereas a request like, “How many shares do I have in the growth fund?” would be processed as a request for information about a particular fund.
A conventional way to classify natural language input is through the use of rules or grammars that map pre-defined input into specific classes or actions. While grammars are very powerful and effective, they become more complex as the scope of the application grows and can therefore become difficult to write and debug. In addition, when the user request is stated in a way that is not covered by the grammar, the request may be rejected, limiting the extent of acceptable “natural” language input. Also, linguistic skills are generally required to write unambiguous grammars, whereby the required skill level necessarily increases as the application becomes more complex.
One approach to training NLU models for improving interpretation abilities is to collect a corpus of domain specific sentences containing probable user requests and to classify the user requests based on the implied actions of the sentence. For example, in a call routing application, example sentences associated with a routing destination can be used to train NLU models. The user requests can be categorized into a single monolithic statistical language model that captures the mapping between the sentences in the entire corpus and their implied actions. During program execution, when a user is interacting with the NLU system, the single statistical language model can classify the sentences into probable actions. For example, the probable action in a call routing destination is the connection of a user to a routing destination.
Building a natural language understanding (NLU) system generally requires training a large corpus to properly interpret broad and narrow language requests. A developer of an NLU system may be required to find training data relevant to the application. The task of identifying and classifying the training data can be a time consuming and tedious process. The developer must generally manually search through a database, classify the data and manually train the language models. The developer collects a corpus of domain-specific sentences of likely user requests (referred to as training data) and then classifies the sentences based on the actions implied in the sentence. This corpus is then typically used to build a single monolithic statistical language model that captures the mapping between the sentences in the entire corpus and their implied action or actions. At runtime, the statistical model is used to classify sentences into likely actions. While this monolithic statistical model approach to action classification can be quite effective, it has certain limitations, especially as the number of actions increases.
One disadvantage of a single monolithic language model is that as the number of actions, or targets, increases, the amount of required data can increase. In order for a language model to perform more sophisticated tasks, it may be necessary to provide more data which can accordingly make training and tuning more complex. As the amount of data and number of actions increase, there is overlap between actions leading to confusion between actions and thereby increasing the misinterpretation of sentences leading to lower classification accuracies. Additionally, a monolithic model is not very effective in identifying multiple pieces of information from a single request or obtaining precise levels of classification.
Also, capturing all the nuances of an application domain using a single monolithic statistical model is not straight forward. Accordingly, a developer must generally build and design combinations of multiple statistical models that work together to interpret natural language input which makes developing such applications more complex. Accordingly, identifying the optimal combination of models becomes more challenging as the complexity grows. The effort requires a higher degree of customization and skill level from the developer. This can complicate the development of natural language understanding applications. The developer can be required to specifically train the models by identifying data for the broad and narrow level models. In addition, the developer can be required to combine various combinations of models for achieving acceptable interpretation performance.
With multiple models, various configurations can each provide various improvements or degradations in performance. Too few models may not be capable of capturing all the details contained in user requests, whereas with too many models there may be insufficient data to train all of them resulting in sparsely trained models which yield poor accuracy. The developer is therefore burdened with responsibility of identifying an optimal number of models each with an associated set of training data that needs to be selected to properly train the individual models. In practice, the developer may be required to know a priori how many models to build, how to partition the data, or how to configure the sequencing of the models. The task can be quite difficult thereby presenting a need for automating the selection of the training data, the optimal number of models, and the optimal configuration of the models for producing the highest performance with respect to the application domain. A need therefore exists for a reliable classification approach which is highly accurate and that is flexible with respect to interpreting user input, while at the same time reduces the skill level and time required of a developer to create the model.