1. Field of the Invention
The present invention relates to a method and an apparatus for producing an abstract of a document from given document data.
2. Description of the Background Art
In recent years, it has become fashionable to store a large amount of technical documents such as patent documents as files in a database. In such a database system, key words characterizing particular technical fields for each documents are also registered for the sake of document search.
However, in general, such key words alone are not sufficient to properly characterize documents. For this reason, it appears desirable to have a concise abstract summarizing each document for every one of the large amount of documents, but the number of the document usually defies practical implementation.
As a solution, there has been propositions for an automatic production of abstracts using a computer.
In one of such propositions made by H. P. Luhn in "The Autonmatic Creation of Literature Abstracts" IBM J. Res. Dev. Vol. 2, pp. 159-165, sentences in a document which contains words that appears frequently in that document are extracted from the document as an abstract of the document. This method is based on an assumption that important words appear frequently in a document. However, frequently appearing words may not necessarily be precisely indicative of the content of the document, so that inappropriate abstracts are often obtained by this method. Moreover, the method has a drawback that, as the sentences with frequently appearing words are to be extracted, the number of sentences to be extracted also tends to become numerous, while a concise abstract is more desirable.
In another proposition made by D. Fun, et al. in "Step toward the evaluation of text" IJCAI85, pp. 840-844, an attempt has been made to evaluate the content of the document more properly so that an abstract with correct meaning can be obtained. However, the actual realization of such method still remains to be achieved.
Thus, conventionally, it has been difficult to produce abstracts automatically, so that production of the abstracts actually relied on human resources.