The present invention relates to natural language processing. More particularly, the present invention relates to the field of sentence realization in natural language generation.
Natural language processing can involve various different aspects, such as natural language processing and natural language generation. In processing natural languages, such as English, Hebrew and Japanese, a parser is typically used to analyze sentences. To conduct the analysis, parsers utilize extensive analysis grammars developed for the task. Analysis grammars are sets of grammar rules which attempt to codify and interpret the actual grammar rules of a particular natural language, such as English.
A subtask of natural language generation is sentence realization: the process of generating a grammatically correct sentence from an abstract semantic/logical representation. Where an extensive grammar has been constructed for automatic natural language analysis, specifying the legal syntactic constructions of a language, it is desirable to use the same grammar specification when automatically producing sentences. However, wide coverage analysis grammars allow many syntactic variations of the same semantic representation, for example the alternative sentences “John ran quickly”, “John quickly ran” and “Quickly, John ran” may all be assigned the same semantic representation, of the form:    run (+Past)    Actor: John    Manner: quickly
When generating sentences from such a representation using the same grammar, a single preferred form must be chosen, and in cases where the analysis grammar allows ungrammatical sentences to be processed (intentionally or not) these ungrammatical forms will be additional options in the grammar for generation and must be excluded. Also, the formalism used to represent an analysis grammar is typically chosen without considering generation, and converting an existing grammar to a form suitable for generation is often more difficult than writing a new generation-specific grammar. Where it is possible to automatically simplify the grammar to aid the conversion process, this will typically lead to an increase in the range of ungrammatical sentences allowed by the grammar (termed over-generation), which must again be excluded during generation.