The present invention deals with natural language generation. More specifically, the present invention deals with sentence realization in a natural language generation system.
A natural language generation system generates a text from a linguistic representation of a sentence. Such systems typically include a text planner or content selection component, a sentence planner component and a sentence realization component.
The text planner or content selection component obtains, as an input, content that is to form the basis of the realized text. The sentence-planning portion determines how to organize the content into sentences, and the sentence realization component determines how to formulate the actual output sentence.
For example, assume that the text planner provides content words such as “Little Red Riding Hood”, “walking”, and “grandmother's house”. The sentence planner determines that “Little Red Riding Hood” is the agent, the action is “walking”, and the destination is “grandmother's house”. The sentence planner provides this abstract linguistic representation as an input to the sentence realization component. The sentence realization component performs the complex task of mapping from the abstract linguistic representation to an actual sequence of words and punctuation corresponding to that abstract linguistic representation. The actual sequence of words and punctuation is the realized sentence (also referred to as the surface string) which is output by the system.
Prior sentence realization systems have tended to fall into two different categories. The first type of system is a hand-coded, rule-based system that successively manipulates the linguistic representation to produce representations from which the surface string can simply be read. In such systems, computational linguists typically explicitly code strategies for stages ranging from planning texts and aggregating content into a single sentence, to choosing appropriate forms of referring expressions, performing morphological inflection and formatting an output. Such systems have typically included a large volume of handwritten code which is extremely time consuming to produce. In addition, such hand-coded systems encounter great difficulty in adapting to new domains, and even more difficulty adapting to different languages.
The second type of sentence realization system, typically used in the past, attempts to generate candidate sentences directly from the input linguistic representation. For example, such systems have been used in highly domain-specific applications (such as in flight reservations) in which there are a finite number of templates, and the content words are simply assigned to the various slots in the templates. The filled-in templates are used to directly generate an output.
Another type of sentence realization system enumerates all possible candidate sentences that can be generated from the abstract linguistic representation of the sentence. In these cases, the candidate sentences are evaluated using statistical techniques that prefer the sentences in which combinations of words most closely match combinations observed in real text. However, for a given linguistic representation, the number of candidate sentences to be examined can be extremely large. This leads to slow computation times. Furthermore, the techniques used to evaluate the candidate sentences often perform poorly on long distance linguistic phenomena. This makes such systems ill-suited to genres and languages in which long distance phenomena are common.
An example of a system in this third category is the Nitrogen system, as described in Langkilde, I. and K. Knight, 1998, “The Practical Value of N-Grams in Generation,” Proceedings of the 9th International Workshop on Natural Language Generation, Niagara-on-the-Lake, Canada, pages 248-255; and Langkilde, I. and K. Knight, 1998, “Generation that Exploits Corpus-Based Statistical Knowledge,” Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING-ACL 1998), Montreal, Quebec, Canada, pages 704-710.
In the first of these systems, word bi-grams are used instead of deep linguistic knowledge to decide among alternative output sentences. Two sets of knowledge-engineered rules operate on the input specification to produce candidate output sentences. One set of rules performs one-to-many mappings from under-specified semantics to possible syntactic formulations, fleshing out information such as definiteness and number that might be missing in practical generation contexts such as Japanese-to-English machine translation systems. The second set of rules, which include sensitivity to the target domain, transforms the representations produced by the first module to yield still more candidate sentences that are represented as a word lattice. Morphological inflection, performed by simple table look-up, further expands the lattice. Word bi-grams are used to find the optimal traversal of the lattice, yielding the best-ranked output sentence. This system generates a very large number of candidate sentences to be scored and ranked. For example, in one of the examples given in Langkilde, I. and K. Knight, the input semantic form includes five lexical nodes in such relationships as AGENT, DESTINATION, and PATIENT. The word lattice that results from this semantic input contains more than 11 million possible paths, with the top-ranked candidate being “Visitors who came in Japan admire Mount Fuji.” Another such example (for which the semantic input representation is not given) appears to contain only two content words that are transformed into a lattice containing more than 155,000 paths to yield the top-ranked candidate “I can not betray their trust.”
The word bi-gram language model used in this system suffers from its inability to capture dependencies among non-contiguous words. Increasing the order of the language model to tri-grams or to higher order n-grams is possible, but the models still fail to capture typical long distance dependencies. Furthermore, data sparseness is an issue as the order increases.
We also note other prior work relevant to the parts of the present disclosure referred to below as the order model. One relevant area includes “generative” parsing models. Such models are employed in the parsing (i.e., syntactic analysis) process to assign probabilities to alternative syntax trees. The name “generative” indicates that the model can also be sampled randomly to generate a sentence structure according to the distributions in the model. As in the parsing process, such a model can assign a probability to possible constituent structures, given relevant features during the generation process.
Examples of such parsing models are set out in the following publications. Eugene Charniak, “A Maximum-Entropy-Inspired Parser”, appearing in The Proceedings of NAACL-2000, Seattle, Wash., pp. 132-139. Also: Eugene Charniak, “Immediate-Head Parsing for Language Models”, appearing in the Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (2001), Toulouse, France, pp. 116-123. In the work described in these papers, assessments of constituent probability are conditioned on contextual information such as the head of the constituent. One aspect of the order models in the present invention that sets the work disclosed here apart from Charniak's models and from prior generative parsing models is the use of semantic relations and other features available to the generation task but not during parsing.
Another point of reference is the parsing work of David Magerman, which employed decision trees to estimate distributions of interest for parsing. See Magerman M. 1995, “Statistical Decision-Tree Models for Parsing,” in Proc. of ACL, pp. 276-283. The primary distinctions between that work and this invention are the use in parsing versus generation and the difference in features available to each model. Furthermore, Magerman's models were not generative.
Word and constituent order play a crucial role in establishing the fluency and intelligibility of a sentence. Establishing order in the sentence realization stage of natural language generation has generally been accomplished by handcrafted generation grammars in the past. See for example, Aikawa T. et al., 2001, “Multilingual sentence generation,” in Proceedings of the 8th European Workshop on Natural Language Generation, Toulouse, France pp. 57-63; and Reiter E. et al., 2000, “Building natural language generation systems,” Cambridge University Press. Recently, statistical approaches have been explored. The Nitrogen system described above and the Fergus system (see Bangalore S. and Rambow O., 2000, “Exploiting a probabilistic hierarchical model for generation,” in Proceedings of COLING 2000, Saarbrücken, Germany, pp 42-48) have employed word n-gram language models to choose among a large set of word sequence candidates which vary in constituent order, word order, lexical selection, and morphological inflection. In the Nitrogen and Fergus systems, constituent order is only modeled indirectly through word n-grams on the surface strings; i.e., order is not isolated as a separate phenomenon from the selection of appropriate morphological variants and the resolution of underspecified inputs. Also, they do not leverage significant linguistic features available during realization.
The Halogen system (see Langkilde I., 2000, _“Forest-Based Statistical Sentence generation,” in Proceedings of NAACL 2000, pp. 170-177; and Langkilde-Geary I., 2002, “An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator,” in Proceedings of the International Language Generation Conference 2002, New York, pp.17-24.)—like Nitrogen—uses a word n-gram model, but it extracts the best-scoring surface realizations efficiently from a forest (rather than a lattice) by constraining the search first within the scope of each constituent.
The Amalgam system (see Corston-Oliver et al., 2002, “An overview of Amalgam: a machine-learned generation module,” in Proceedings of the International Language Generation Conference 2002, New York, pp.33-40) has an explicit ordering stage that determines the order of constituents and their daughters rather than words directly. Amalgam leverages tree constituent structure and features of those constituents. By establishing order within constituents, Amalgam constrains the possible sentence realizations at the word level. However, improvements in the Amalgam models of constituent structure used to establish constituent order in natural language generation can yield improved results; these enhancements are the focus of the present disclosure.