Extractive summarization is the process of selecting and extracting text spans--usually whole sentences--from a source document. The extracts are then arranged in some order (usually the order as found in the source document) to form a summary. In this method, the quality of the summary is dependent on the scheme used to select the text spans from the source document. Most of the prior art uses a combination of lexical, frequency and syntactic cues to select whole sentences for inclusion in the summary. Consequently, the summaries cannot be shorter than the shortest text span selected and cannot combine concepts from different text spans in a simple phrase or statement. U.S. Pat. No. 5,638,543 discloses selecting sentences for an extractive summary based on scoring sentences based on lexical items appearing in the sentences. U.S. Pat. No. 5,077,668 discloses an alternative sentence scoring scheme based upon markers of relevance such as hint words like "important", "significant" and "crucial". U.S. Pat. No. 5,491,760 works on bitmap images of a page to identify key sentences based on the visual appearance of hint words. U.S. Pat. Nos. 5,384,703 and 5,778,397 disclose selecting sentences scored on the inclusion of the most frequently used non-stop words in the entire text.
In contrast to the large amount of work that has been undertaken in extractive summarization, there has been much less work on generative methods of summarization. A generative method of summarization selects words or phrases (not whole sentences) and generates a summary based upon the selected words or phrases. Early approaches to generative methods are discussed in the context of the FRUMP system. See DeJong, G. F., "An Overview of the FRUMP System", Strategies for Natural Language Processing, (Lawrence Erlbaum Associates, Hillsdale, N.J. 1982). This system provides a set of templates for extracting information from news stories and presenting it in the form of a summary. Neither the selection of content nor the generation of the summary is learned by the system. The selection templates are handcrafted for a particular application domain. Other generative systems are known. However, none of these systems can: (a) learn rules, procedures, or templates for content selection and/or generation from a training set or (b) generate summaries that may be as short as a single noun phrase.
The method disclosed herein relates somewhat to the prior art for statistically modeling of natural language applied to language translation. U.S. Pat. No. 5,510,981 describes a system that uses a translation model describing correspondences between sets of words in a source language and sets of words in a target language to achieve natural language translation. This system proceeds linearly through a document producing a rendering in the target language of successive document text spans. It is not directed to operate on the entire document to produce a summary for the document.