The exemplary embodiment relates to natural language processing and finds particular application in connection with a system and method for response generation in a dialogue between a customer and a customer service agent.
Dialogue systems are often employed in customer services. In many situations, a call center agent acts as an interface between the customer and a knowledge base. However, employing humans to serve as customer service agents can be expensive. In addition, creating, maintaining, and accessing knowledge bases is expensive and time-consuming. For example, there may be a large repository of dialogue data in customer call centers relating to customer sentiment, described symptoms, problem types, root causes and the techniques agents use to resolve customer problems. Furthermore, these repositories usually share some common characteristics when linguistic information is extracted from the dialogues. The lexicons, both words and phrases, and the syntactic structure are very similar from one dialogue to another, often with very little variation. All of this data belong to a limited, pre-defined set of expressions, which can all be processed in advance, stored, and accessed by a dialogue system to generate text.
Dialogue systems usually include three parts: a Natural Language Understanding (NLU) module, a Dialogue Manager (DM) module, and a Natural Language Generation (NLG) module. In most implementations, the NLU and NLG modules are not related nor are they connected. They are completely independent and can be replaced by any other available NLU or NLG modules.
The NLG module contains different parts in order to generate sentences. These may include a content planner, which links the intention of communication (i.e., instructions given by the Dialogue Manager module) and the semantic representation, a sentence planner, which links the semantic and the syntactic representations, and a surface generator, which lexicalizes the syntactic representation with words and aggregates them. Finally, a structure realizer applies the right morphology to the words, according to the syntactic representation.
Most approaches to NLG are based on word templates with specific slots for variable words in order to generate sentences. These gaps are usually replaced with keywords or named entities, such as device names, city names, etc. In order to extract these templates, a set of sentences is processed with a NLU module, to identify the necessary entities on which these word templates will be based. In this approach, all the parts of a NLG module are put together. This approach can be implemented with Finite States Machines, but it lacks flexibility and does not allow variations or paraphrases in NLG.
Other approaches feature systems which usually require specific grammars, based on context-free rules or on dependencies, to generate new sentences. These include KPML, disclosed in Bateman, “Enabling technology for multilingual natural language generation: the KPML development environment,” Natural Language Eng., 3(1):15-55, 1997, and RealPro, disclosed in Lavoie, et al., “A fast and portable realizer for text generation systems,” in Proc. 5th Conf. on Applied Natural Language Processing, pp. 265-268, 1997. However, despite slight differences in implementation between these systems (e.g., KPML is based on context-free grammars, while RealPro is based on dependencies), they all share the same input issue. Specifically, translation from original data to NLG text is done through a complex semantic representation that restricts the use of these systems to trained users. Others have tried to propose a solution to solve this issue.
For instance, Knott and Wright in “A dialogue-based knowledge authoring system for text generation,” AAAI Spring Symposium on Natural Language Generation in Spoken and Written Dialogue, pp. 71-78, 2003, have proposed a way to enrich the knowledge base of generation systems with simple sentences which are then analyzed on the fly and whose content is used to enrich the knowledge base. In other words, planner rules are constructed from the analysis of simple sentences. However, this knowledge base does not contain actual sentences, but information that is extracted from the sentences.
Power and Scott, in “Multilingual authoring using feedback texts,” Proc. 17th Int'l Conf. on Computational Linguistics, Vol. 2, pp. 1053-1059, 1998, propose a similar approach. However, the knowledge base is used as a guide to the construction of a semantic representation of the original data, not as way to directly guide the process of NLG.
U.S. Pub. No. 20050138556 to Brun et al., incorporated herein by reference in its entirety, describes another system of generation in which a syntactic parser is involved. This system takes as input short descriptions in a specific domain from which semantico-syntactic templates are extracted. The goal of this system is then to generate an automatic summary by merging these different descriptions. The generation is done using MDA (Multilingual Document Authoring), which uses the extracted templates to provide the summary in many different languages. See, Brun et al., “Document structure and multilingual authoring,” Proc. 1st Int'l Conf. on Natural language Generation, Vol. 14, pp. 24-31, 2000.
There remains a need for a natural language generation system which is flexible, having a knowledge base built from natural language input which is also directly used to generate text. The exemplary systems and methods disclosed herein can meet this need through the analysis of short sentences, which already convey rich linguistic information, and through the use of small pieces of generation grammar to produce complex sentences in a given natural language.