1. Field of the Invention
The present invention relates to information retrieval for retrieving information from documents, etc., and in particular an apparatus for identifying a reference reason and retrieving information using the reference reason if there is a referring/referred correlation between documents.
2. Description of the Related Art
Several technologies are proposed to search for a reference correlation if there is a referring/referred correlation between documents. Such technologies include the following Japan patent applications.
(1) Japan Patent Laid-open No. 63-228221 (Mitsubishi Electric)
The reference correlation between documents are stored, and retrieval is conducted based on some clue information using a fuzzy logical operation.
(2) Japan Patent Laid-open No. 63-153630 (NEC)
Documents having a common reference correlation can be retrieved utilizing the referred-correlation between documents and using a reference document as a retrieval item. Documents having a common reference correlation are documents which have a common reference document and are considered to have an important correlation between them.
(3) Japan Patent Laid-open No. 1-191258 (Ricoh)
Both the text of a document and the name of a reference document automatically extracted from the text are simultaneously presented to simplify editing.
(4) Japan Patent Laid-open No. 6-282534 (NEC)
The fact that a specific document is cited is automatically notified to the user of the referred document.
(5) Japan Patent Laid-open No. 7-311780 (Cannon)
Documents related to a specific document are searched for based on a reference correlation. The search results are displayed in descending order of importance. The importance of a document is determined based on the reference frequency of the document.
(6) Japan Patent Laid-open No. 8-272818 (New Nippon Steel)
If a document is designated, documents related to the document are displayed, and further retrieval becomes possible by selecting the displayed document.
(7) Japan Patent Laid-open No. 9-146968 (Hitachi)
Other documents which make a reference similar to that made by a specific document are searched for.
(8) Japan Patent Laid-open No. 10-105572 (NEC)
It is judged whether there is some correlation between documents using both a reference correlation and a keyword, and a document aggregate is generated based on the judgment.
FIG. 1A shows such a conventional information retrieval system. This system comprises a retrieval apparatus 1, a full text database 2 and an reference correlation database 3. If a document is inputted, both a document which has a common reference correlation and a document which makes a similar reference are displayed as related documents. The retrieval unit 5 of the retrieval apparatus 1 retrieves documents in the full text database 2, and the selection unit 6 of the retrieval apparatus 1 selects related documents using the reference correlation in the reference correlation database 3.
Information about related documents are displayed, for example, in a format shown in FIG. 1B. If there are a plurality of related documents, for example, the related documents are displayed in descending order of importance.
In the applications other than 8-272818 and 10-105572 of the Japan patent applications described above, basically only alternative reference correlations are used. One of the correlations indicates that a reference is made and the other indicates that a reference is not made.
In 8-272818, the positioning of a reference correlation is also displayed. In this case, some information must be attached in advance to the reference correlation by a human being. In 10-105572, it is judged whether there is some correlation between documents using both a reference correlation and a keyword.
As described above, in the conventional arts, the alternative information of whether there is some correlation between documents is used as the correlation between documents.
However, the conventional information retrieval system described above has the following problems.
In 7-311780, related documents are displayed as important documents in the descending order of the reference frequency of documents. In this case, a user must judge the importance of documents individually according to the criterion uniquely provided by the equipment side.
In other conventional arts, reference correlation is displayed using the alternative information of whether a document is referred to, analysis is made from one viewpoint of “whether a document is important” for a specific document and documents are searched for. A retrieval system using such alternative information has the following disadvantages.
(1) The reason why a document is important cannot be provided to a user.
(2) Analysis made from one viewpoint of whether a document is important can indicate no correlation between documents.
(3) Even if it is known what type of document is necessary, all related documents are displayed together.
In such a system, if, for example, a scientific document is retrieved, a document which has no direct correlation to a topic, such as a document on a new technology, a document on widely used software, etc., are often displayed with higher rankings. In this case, these documents are often not documents that should be retrieved. The conventional system cannot avoid such a problem.