Spoken language understanding systems interpret the word sequences of user utterances. For example, spoken language understanding systems are used by task-oriented virtual agents. Virtual agents are computer-generated agents that can interact with users. Goal- or task-oriented virtual agents may communicate with human users in a natural language and work with or help the users in performing various tasks. The tasks performed by a virtual agent can vary in type and complexity. Exemplary tasks include information retrieval, rule-based recommendations, as well as navigating and executing complex workflows. Informally, virtual agents may be referred to as “chatbots.” Virtual agents may be used by corporations to assist customers with tasks such as booking reservations and working through diagnostic issues (e.g., for solving an issue with a computer). Using virtual agents may offer a corporation advantages by reducing operational costs of running call centers and improving the flexibility with which a company can increase the number of available agents that can assist customers.
Spoken language understanding systems help virtual agents determine what the human user desires. The spoken language understanding system converts the word sequences of user utterances to a hidden state representation of its meaning that the virtual agent can utilize. Then, the spoken language understanding system assigns a meaning to the hidden state representation that a downstream component of the virtual agent, such as a dialogue manager, can use to respond to the human user. Typically, a spoken language understanding system used in the context of task-oriented virtual agents performs three functions when processing a word sequence of a user utterance: (1) classify a user's speech act into a dialogue act category, (2) identify a user's intent, and (3) extract semantic constituents from the word sequence. The spoken language understanding system usually separately performs one of these three functions at a time. Performing one function at a time limits the speed with which the spoken language understanding system can process a user utterance. Additionally, performing one function at a time limits the accuracy of each function. Finally, separately training for each of the three functions limits the speed with which the training can be completed.
There is a need in the art for a system and method that addresses the shortcomings discussed above.