1. Field of the Invention
This invention relates to methods and apparatus for creating abstracts of documents. More particularly, through the detection of words and/or phrases that indicate emphasis, this invention automatically ranks sentences in a document which can be used to create an abstract or to otherwise edit the document.
2. Description of Related Art
Document abstracts enable the reader to save time because a judgment as to the relevance of a document can be made without scanning the entire document. There are two types of abstracts. The first type of abstract summarizes the main contents of the document. The second type of abstract does not summarize the document, but instead explains the general subject matter of the document.
Document abstracts are typically required with formal publications. However, not all documents (as originally prepared) have abstracts, and not all the abstracts manually prepared by people are adequate. Therefore, a practical and automatic construction of useful document abstracts is needed.
Automatic document abstracts are clearly useful in themselves, but they can also be components of larger systems. For example, a document retrieval system typically mows from queries to documents, i.e., from a few words to all words in the document. It may be beneficial to reduce the step size of this jump by moving instead from queries to abstracts to documents. In particular, an arbitrarily long document could be compressed to fit on one screen by applying a suitable reducing summarization.
Some automatic extracting systems distinguish between words occurring in the plain text and words occurring in titles and captions. The plain text may receive standard term-weighing, and special words in the title or caption may receive special treatment based on the location of the terms in the specific document. Some systems simply choose the first sentence of each paragraph. Another method gives special treatment to high frequency words, to rarely used words, specific phrases, or even specific paragraphs. Then each sentence or paragraph is scored depending on the frequency of the words or phrases. Such abstract forming techniques are described in Automatic Text Processing (Gerald Salton, Addison-Wesley, 1989).