The promise and excitement around conversational services has rapidly grown in recent years. Besides the popularity of intelligent assistants, there is a rise in the use and execution of specialized bots (e.g., also called “skills” in the ALEXA virtual assistant from AMAZON.COM, INC. and the CORTANA virtual assistant from MICROSOFT CORP., and “actions” in the GOOGLE ASSISTANT virtual assistant from GOOGLE LLC.). Particularly useful are task-oriented chatbots that act as agents on behalf of users to interact with external services to accomplish specific tasks through the execution of one or more operations of a corresponding computer application—such as booking a taxi cab, making a restaurant reservation, or finding a recipe—using natural language conversation.
Currently, most conversational services can be built using a slot-filling approach. With such an approach, the user's phrase (e.g., “I want a coffee”) indicates an intent, an action the system supports, such as “order-coffee (with additional parameters such as type, size). Input parameters necessary for intent execution, such as coffee type and size, can be described as slots. With the slot filling approach, a control structure can operatively define the operations for a multi-turn conversation between the system and the user to collect all slots necessary to fill the intent.
Slot-filling has proven reliable but requires significant developer effort. First, control structures, typically in the form of finite-state automata, need to be hand-designed for each task. Such control structures can be complex as they need to account for many possible execution paths. Second, to support user interactions in natural language, models for understanding user questions and answers need to be trained. Training typically requires many utterances, i.e., sample phrases users may use during the interaction. For example, even to a simple question like “What is your party size?”—users may answer in many different ways such as “3”, “all of us”, “me and my wife” or “not sure, I'll tell you later”. To train robust models, a developer must consider all such possible phrase variations and provide dozens of utterances for each slot and for each intent. Finally, developers need to enumerate possible values for each slot to boost slot recognition in language understanding. As a result, this whole process requires significant manual coding, thus hindering scalability to new tasks and domains.
An alternative to slot-filling is a corpus-based approach where bots are automatically trained from datasets of past conversations. This approach has shown promise for non-task-oriented “chit-chat” bots, but it is unclear whether it alone can model task-oriented bots. Exclusive machine-learned systems cannot guarantee critical in-task constraints are met (e.g., a user cannot reserve a restaurant without specifying a time), and they lack a model to ensure completion of an actual objective in the task. These systems are also difficult to train due to the scarcity of domain-specific conversation logs.
Hybrid approaches also exist. For example, hybrid code networks (HCNs) is an approach to make machine-learned systems practical by combining a recurrent neural network with developer hand-coded rules. HCNs reduce the amount of training data at the expense of developer effort. Another hybrid approach is the “knowledge-grounded” conversation model which injects knowledge from textual data (e.g., restaurant reviews on FOURSQUARE®) into models derived from conversational data (e.g. TWITTER®) to generate informative answers. However, these models work only for single-turn responses and depend on in-domain knowledge sources.
It is with respect to these considerations and others that the disclosure made herein is presented.