For many applications of natural language generation (NLG), the range of linguistic expressions that must be generated is quite restricted and a grammar for NLG in these instances can be fully specified by hand. Moreover, in many cases it is important not to deviate from certain linguistic standards in generation, in which case hand-crafted grammars provide excellent control. However, in other applications for NLG (which are ever-increasing as the technology evolves), the variety of output is much larger, while the demands on the quality of the output typically becomes less stringent. A typical example is NLG in the context of interlingua- or transfer-based machine translation. Additionally, the output quality from NLG may be relaxed if there is insufficient time available to develop a full grammar for a new target language in NLG.
The basic tasks of natural language generation include: text planning (i.e., the content and structure of the target text are determined to achieve the overall communicative goal), sentence planning (i.e., linguistic means (particularly lexical and syntactic means) are determined to convey smaller pieces of meaning), and realization (i.e., the configuration chosen in sentence planning is transformed into a surface string, by linearizing and inflecting words in the sentence). During the realization process, “function words” may be added to the sentence as well.
In each case, stochastic (e.g., “empiricist”) methods provide an alternative to hand-crafted (“rationalist”) approaches to NLG. A description of the stochastic technique can be found in an article entitled “Generation that exploits corpus-based statistical knowledge” by I. Langkilde et al., appearing in the Proceedings of the 36th Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, 1998, Montreal, Canada, at pp. 704–710. Stochastic approaches to natural language generation do not include a tree-based representation of syntax. While this may be adequate (or even advantageous) for some applications, other applications profit from using as much syntactic knowledge as is available, leaving to a stochastic model only those issues that are not determined by the grammar.
A need remains in the art, therefore, for improvements upon the stochastic-based natural language generation methods.