1. Field of the Invention
The present invention generally relates to an instance-based sentence boundary determination method and, more particularly, to a method for the generation of sentences which are optimized by a set of criteria based on examples in a corpus.
2. Background Description
The problem of sentence boundary determination in natural language generation exists when more than one sentence is needed to convey multiple concepts and relations. In the classic natural language generation (NLG) architecture, sentence boundary decisions are made during the sentence planning stage in which the syntactic structure and wording of sentences are decided. Sentence boundary determination is a complex process that directly impacts a sentence's readability, its semantic cohesion, its syntactic and lexical realizability, and its smoothness between sentence transitions. Sentences that are too complex are hard to understand, so are sentences lacking semantic cohesion and cross-sentence coherence. Furthermore, bad sentence boundary decisions may even make sentences unreadable.
Existing approaches to sentence boundary determination typically employ one of the following strategies. The first strategy uses domain-specific heuristics to decide which propositions can be combined. For example, Proteus produces game descriptions by employing domain specific sentence scope heuristics. This approach can work well for a particular application; however, it is not readily reusable for new applications. The second strategy is to employ syntactic, lexical, and sentence complexity constraints to control the aggregation of multiple propositions. These strategies can generate fluent complex sentences, but they do not take other criteria into consideration, such as semantic cohesion. Furthermore, since these approaches do not employ global optimization, the content of each sentence might not be distributed evenly. This may cause a dangling sentence problem, for example.