1. Field of Invention
The present invention relates to the field of interactive human-machine communication. More specifically, it describes a novel method of generation of natural language in dialog systems.
We describe the so-called Sail Labs Answer Generator (SAG), a real-time, multilingual, general-purpose generation system that enables the use of dynamically generated natural language for dialog applications. That is, SAG does not require developers to specify all possible natural language outputs before runtime. SAG consists of a tactical generation component and specifies an interface to the lexicon as well as an interface to a strategic component. SAG emphasizes modularity, quality and speed of generation.
2. Description and Disadvantages of Prior Art
Natural Language Generation (NLG) is the subfield of artificial intelligence and computational linguistics that focuses on computer systems that can produce understandable texts in some human language (cf. Reiter and Dale 2000). The input to NLG is typically some underlying non-linguistic representation of information, from which text in a variety of forms (e.g. documents, reports, explanations, help and error messages) is automatically produced.
NLG is an emerging technology with many applications both real-world and potential. For instance, NLG technology is used in commercial applications in partially automating routine document creation. It is also being used in the research laboratory to present and explain information to people who do not wish to deal with non-linguistic data. Research on NLG has so far focused on the generation of entire documents (e.g. user manuals, etc.), a particularly interesting field as it promises to compensate specifically for the stagnation in the field of machine translation. However, very few systems, mainly research projects, have used NLG technology for real-time applications. Commercial applications in the area of dialog systems to date use NLG technology mainly in its most primitive and restricted form, namely canned text. In the longer term, it is in the domain of dialog systems that NLG is likely to play an important role to allow richer interaction with machines than is possible today.
The present invention relates precisely to using NLG technology in dialog systems by replacing the canned text and/or template-based approach to the task of realization by dynamic generation.
In order to understand the significance of using dynamic generation for NLG systems a couple of words on how realization relates to NLG are in place.
Realization is one of the standard component modules of natural language generation. It is also commonly referred to as the tactical generation or the HOW generation as opposed to the strategic or the WHAT generation (cf. McKeown 1985), which is often also referred to as the text planning component. (While there is some variation with respect to the architecture of a natural language generation system and the exact division of labor among the comprising modules, all generators involve a realization task. For instance, Reiter and Dale (2000) operate with a three-layer architectural model, distinguishing between a document planning module, a microplanning module and a realization module.) Generally speaking, the text planning component produces an abstract specification of the text's content and structure, using domain and application knowledge about what information is appropriate for the specified communicative goal, user model and so on. The realization component on the other hand determines how best to package information into chunks of natural language and ultimately converts the abstract specification supplied by the text planning component into real natural language text.
Standard approaches used for realization are: canned text, template, phrase-based and feature-based realization (cf. Hovy 1997).
A canned text is a predefined string written by the system designer before runtime to achieve a given communicative goal (e.g. warning, suggestion) whenever its trigger is activated. This method of realization has been widely used to generate error or help messages for a multitude of computer applications.
A template is a predefined form whose empty fields are filled by information provided by either the user or the application at runtime. The input to a template-based system is some feature structure with variable corresponding values. A well-known instance of template-based realization is mail-merge programs. Template-based systems are thus more sophisticated and more flexible than canned text systems and they are arguably faster and thus more appropriate for real-time applications because the size and number of structures to be traversed are relatively small. However, even though the quality of template-based systems is higher than that of canned text systems, the degree of control on the output text is not as fine-grained as might be desirable and generally compares unfavorably to that provided by grammar-based systems.
Phrase-based systems may be conceived of as generalized templates that represent different types of phrases found in some natural language. These phrases are related to each other by a set of rules (a grammar) that specifies the structural description of well-formed natural language utterances.
In feature-based realization systems each possible minimal expression alternative is represented by a single feature (for instance, a noun is either definite or indefinite, singular or plural, etc.). The generation process involves either traversing a feature selection network or by unification with a grammar of feature structures until a complete set of feature value pairs is collected to realize each part of the input. The simplicity of their conception and their flexibility make feature-based systems very powerful. Feature-based systems allow for the realization of very high quality text. However, due to the fact that normally the entire grammar and not just the input to realization must be traversed, such systems are not optimal for real-time applications.
There are a number of reasons why dynamic generation is to be preferred over simpler methods of text realization such as canned text. While canned text systems are trivial to create, they are also highly inflexible and wasteful on resources. One obvious advantage of dynamic generation across the board is customizability. Turning to the unique needs of dialog systems, the generation speed is a central issue since interaction must occur in real-time. As has often been pointed out in the relevant literature (cf. McRoy et al. 2001), natural language realization systems can address this constraint in two ways: either the system designer must anticipate and specify all possible natural language outputs before runtime and supply the necessary program logic to produce the correct sequence at the correct time and hope that problems will never arise, or the system must be able to dynamically generate natural language outputs in real time. Depending on the aims of design and specifically the need to build software that provides increasingly more customized responses to users, the time and effort required to integrate a potentially complicated piece of software such as dynamic generation often pay off in the long run and compare favorably to the time and effort required to manually generate all output strings.
In the context of dialog systems, existing generators are either too slow (e.g. Penman [Mann 1983], FUF/SURGE [Elhadad 1992 and 1993]) since their approaches traverse the entire generation grammar rather than the input to be generated; or their grammar is too limited (e.g. TEXT [McKeown 1985]) leading to customization and portability problems; or the implementation of grammar is tightly bound to a grammatical theory that is syntactic or structural as opposed to semantic or functional (e.g. Mumble [Meteer et al. 1987]) thus leading to a need that an application include a rather detailed amount of information on the syntax of the target language; or realization is implemented as a production system (e.g. TG/2 [Busemann 1996, Busemann and Horacek 1998]) which is not suitable for real-time systems because of the inherently inefficient derivation of results in such systems; or realization is handled by employing statistical approaches which may provide ungrammatical results (e.g. Nitrogen [Knight and Hatzivassiloglou 1995, Langkilde and Knight 1998a and 1998b]); or the realization system implements a template-based approach (e.g. YAG [McRoy et al. 2001]) which again compromises quality of output. For a detailed overview of some of the best existing realization systems see Reiter and Dale (2000) and McRoy et al. (2001).