1. Field of the Invention
The present invention relates to a document retrieval technique, specifically to a document retrieval technique that outputs the parts of a document related to a retrieval condition from the contents of a retrieved document.
2. Description of the Related Art
A conventional document retrieval system that uses bibliographical items and keywords, etc., as a retrieval condition displays the number of retrieved documents, a list of retrieved titles and the like as a retrieved result. To determine whether or not the retrieved result is appropriate to the retrieval intention, it has been necessary that the user reads and judges each of all the sentences of the retrieved document. However, the retrieval intention of the user is not necessarily appropriately expressed in all the sentences of the document. When many documents are retrieved, or when the sentences of the documents are long, it takes a considerable time for the user to read through all these sentences.
In recent years, mass storage media such as a CD-ROM, or networks such as a LAN or the Internet have brought mass electronic documents in distribution. Accompanied with this trend, the document retrieval system has become popular which aims at retrieving the mass electronic documents. However, a use of such a document retrieval system will frequently lead to a retrieval of great many documents, which is likely to impose an excessive load on the user to determine whether the retrieved result is appropriate.
Accordingly, a method is conceived which outputs only a part of all the sentences of the retrieved document to thereby lessen the load of judging such appropriateness.
There have been proposed various methods that automatically prepare a summary of a text. One of them is such that, assuming the nouns that frequently appear in the text to be the key words, on the basis of the frequencies of appearance of the words in the text, the significance is given to the words, based on the significance of the words thus obtained, the significance is given to the sentences, and the text is summarized by combining these sentences of significance. Another method is that the locations of important parts in a text or in a paragraph are predicted in advance from the structure of the text, and the important sentences are extracted.
In these methods, the same text always prepares the same summary. However, it is preferable to a user to prepare a different summary even from the same text in response to a different retrieval, as the retrieval intention of the user is reflected.
On the other hand, there is a method that prepares a summary by extracting the neighborhoods of a document as a retrieved result that includes the keywords as a retrieval condition. This method is called the KWIC (Keyword in Context), which is widely used, for example, in the display of the Web retrieval implement, etc. However, when the number of the keywords included in the retrieval condition is insufficient, when the parts where the keywords appear are limited, or when the keywords do not properly express the retrieval intention, the retrieval intention of the user is not necessarily presented only in the neighborhoods of the keywords. On the contrary, when the keywords appear in many parts, it becomes difficult to determine which one of these parts is more significant.
The “Device and method for summarizing a document” of the Japanese Published Unexamined Patent Application No. Hei 10-207891 discloses a method for summarizing a document using information significant in the document and information that a user wishes to acquire. This method stores in advance the documents in which the user was interested, the keywords that the user considered to be important and the like, and intends to prepare a summary that reflects the user's interest, from the retrieval condition that the user inputted and the information on the user's interest that has been stored in advance. However, this method requires each user to beforehand input information regarding the interest of each user, and to properly update the information, which is a time-consuming job.
As mentioned above, in the conventional technique for automatically summarizing a text, which determines the significance of a sentence only from the contents of the text, the retrieval intention of a user is disregarded.
In the KWIC, the retrieval intention of a user is not necessarily presented only in the neighborhoods of the keywords, and on the contrary where the keywords appear in many parts in the text, it becomes difficult to judge which one of the parts is more significant.
And, as the “Device and method for summarizing a document” of the Japanese Published Unexamined Patent Application No. Hei 10-207891, in the method for summarizing a document that beforehand inputs information regarding the interest of a user, while the interest of the user is reflected by that in the summary, this method cannot dispense with a time-consuming job that each user inputs in advance information to be acquired.