A dialog system is a computer system that is designed to converse with a human using a coherent structure and text, speech, graphics, or other modes of communication on both the input and output channel. Dialog systems that employ speech are referred to as spoken dialog systems and generally represent the most natural type of machine-man interface. With the ever-greater reliance on electronic devices, spoken dialog systems are increasingly being implemented in many different machines.
Response generation is an important component in developing a conversational dialog system. End users often judge the quality of a dialog system based on the quality of the responses they hear from the system. This relates to the Gricean cooperative principle, which describes how people interact with one another and posits that conversational contributions are made as required, at the appropriate time, and for the purpose of advancing the conversation. One aspect of system quality is the avoidance of obscure or non-responsive expressions, and preferably the use of phrases with which the user is familiar. Increased attention has been paid by system developers and researcher with regard to response generation issues involving not just response generation content, but also content presentation.
Response generation systems use trained models to generate appropriate responses to user input. The quality of such trained models relies on a corpus of training data. Known training systems generally use data collected from one or more real people. Typically, these systems do not use data from the actual user themselves. Therefore, such training data is not necessarily suited to the actual user, and will likely not generate responses that are always or even mostly familiar to the user.
One hybrid approach presently known in the art employs case-based reasoning with rule adaptation. It uses an annotated corpus as its knowledge sources and grammar rules for new sentence construction. In the corpus, each sentence is associated with a semantic representation called SemGraphs and a realization tree called ReaTree. The SemGraph describes semantic relations among the entities in the sentence it is associated with. The ReaTree corresponds to the syntactic lexical representation of the associated sentence, which serves as the base for sentence realization. Text generation goes through the three phases of retrieval, in which, given a SemGraph by a content planner, retrieve sentences with similar SemGraphs from the annotated corpus; adaptation in which one or more adaptation operators are applied to the corresponding ReaTrees for necessary adjustment to the current input SemGraph; and linearization, in which the adapted ReaTree is sent to a linearization module to produce a sentence that meets all the grammatical agreement requirements. In addition, a learning phase is invoked after sentences are generated, where the SemGraph, its corresponding adapted ReaTree, and the generated sentences are first stored in a temporary case repository and then manually verified before getting incorporated into the main corpus for reuse. This approach does not address directly the use of the sentences from the user side for a system response. Furthermore, during the retrieval step, only propositions are adjusted using substitution, deletion and insertion for computing similarity. No operation is done on the speech act aspects. Therefore, similarity between a SemGraph for user and a SemGraph for the system responses is usually very low.
Other approaches may offer improved aligmnent between user and system responses, but such systems, such as those that compute the distance between the system response candidates and its corresponding user utterance using a bag-of-words or a bag-of-bigrams approach over-generate system response candidates by a rule-based production system and hand-written rules. Such systems do not directly and automatically identify the constraint-carrying phrases from the user utterances, which offer better alignment and more natural wording.
Other known systems provide a statistical approach for generation using packed forests to structurally represent many instances. In such systems, a statistical language model is used to rank alternatives given a semantic input. However, these approaches do not address the alignment issue. In general, all present approaches, including those that try to model the user data directly, do not adequately address the issue of disparity between user utterances and system responses.
What is needed, therefore, is a dialog system response generator that effectively utilizes actual user input in order to generate responses that are most meaningful to the user.