Methods and systems of Natural Language Understanding (NLU), which can perform, for example, Spoken Language Understanding (SLU), are used in computerized dialog systems to estimate intentions of utterances. As broadly defined herein, the “spoken” utterances can be in the form of speech or text. If the utterances are spoken, then the utterances can be obtained from, for example, an automatic speech recognition (ASR) system. If the utterances are text, then the utterances can be obtained from, e.g., a text processing systems or keyboard input.
Conventional intention estimation methods can be based on phrase matching, or classification methods, such as boosting, support vector machines (SVM), and Logistic Regression (LR) using Bag of Word (BoW) features of each utterance as inputs. However, the BoW features do not have enough capability to indicate semantic information represented by word sequences due to, for example, missing order of words in the sequences.
To consider a history of a word sequence in each utterance, a Recurrent Neural Networks (RNNs) can be applied for utterance classification using 1-of-N coding instead of the BoW features. Additionally, Long Short-Term Memory (LSTM) RNNs are a form of RNNs designed to improve learning of long-range context, and can be effective for context dependent problems. Those of approaches classify utterances without considering context among utterances. Additionally, it is essential to consider a broader context of a sequence of utterances of an entire dialog to understand intention accurately. Some of the prior art models using RNNs and LSTMs use word sequence context within a single utterance and also consider a broader context of a sequence of utterances of an entire dialog.
Furthermore, each utterance has different expressions in terms of context of party-dependent features such as for task-oriented roles like agents and clients, business dependent terminolgies and expressions, gender dependent languages, relationships among participants in the dialogs. However, conventional methods do not consider such party-dependent features due to the different roles.