1. Field of the Invention
The present invention relates to a document detection system for detecting desired documents from a large number of documents stored in a document database. It is to be noted that the term "retrieval" is often used in the literature of the field instead of the term "detection" used in the following description. The present specification adheres to the use of the term "detection" throughout.
2. Description of the Background Art
In recent years, due to the significant progress and spread of computers, the electronic manipulations of documents are becoming increasingly popular as in the electronic news and electronic mail systems and the CDROM publications of data sources such as dictionaries and encyclopedia that had previously only been available in hard copy. It is expected that this trend of electronic manipulation of documents will continue at an increasing pace in future.
In conjunction with such electronic manipulation of documents, much attention has been given to a document detection system for detecting desired documents from a large number of documents efficiently, so as to enable the effective utilization of the documents stored in a database system.
As a conventionally available document detection system, there has been a system which uses keywords in combination with logic operators such as AND, OR, NOT or proximity operators for specifying numbers of characters, sentences, and paragraphs that can exist between keywords, and detects a document by using a specified combination of keywords and operators as a detection key.
However, in such a conventional document detection system, the detection result has been informed by displaying either a number of detected documents or titles of the detected documents alone, so that in order for the user to check each of the detected documents to see if it is the desired document or not, it has been necessary for the user to read the entire content of each of the detected documents one by one, and this operation has been enormously time consuming.
Moreover, in the conventional document detection system, in displaying the titles of the detected documents, the titles are simply arranged in a prescribed order according to the user's query such as an order of descending similarities to the keywords used in the detection key. For this reason, it has been impossible for the user to comprehend the relative relationships among the detected documents and the level of similarity with respect to the detection command for each of the detected documents from the displayed detection result, and consequently it has been difficult for the user to have an immediate impression for the appropriateness of the displayed detection result.
Furthermore, in the conventional document detection system, the detection scheme is limited to that in which each document as a whole is treated as a single entity, so that the document containing the desired content in the background section and the document containing the desired content in the conclusion section will be detected together in mixture. In other words, the detection result contains variety of documents regardless of the viewpoint in which the desired content appear in the documents. For example, if there is no interest in what had been done in the past, the detected document which matches with the given keywords in the background section will be of no use. Yet, in the conventional document detection system, the documents having different perspectives such as the document containing the desired content in the background section and the document containing the desired content in the conclusion section will not be distinguished, and the mixed presence of these documents in different perspectives makes it extremely difficult for the user to judge the appropriateness of the detection result.
In view of these problems, there has been a proposition for a scheme to reduce the burden on the user to read entire content of each detected document by displaying only a portion of each detected document. However, in such a scheme, it is often impossible to make a proper judgement as to whether it is the desired document or not unless the relationship of the displayed portion and the remaining portion becomes apparent. For example, when the background section containing the desired content is displayed for one document while the conclusion section containing the desired content for the other document, as these documents cannot be comprehended in a unified viewpoint, it is difficult for the user to make a proper judgement as to which one of these document is the necessary one. As a result, in order to fully comprehend the perspectives of the displayed portions in these documents, the user would be forced to read the entire contents of these documents eliminating any practical reduction of the burden on the user.
Also, there has been a proposition for a scheme to reduce the burden on the user to read the entire content of each detected document by providing a man-made document summary for each stored document in advance in correspondence to each stored document itself and displaying the document summary at a time of displaying the detection result. However, in such a scheme, an enormous amount of human efforts is required for preparing the document summary for each document at a time of producing the database itself, which is not practically justifiable unless the database system has a remarkably high utilization rate. Moreover, there are many already existing database systems in which the document summary for each document is not provided, and an enormous amount of human efforts is similarly required for preparing the document summary for each document in such an already existing database system. In addition, the man-made document summary is produced in the very general viewpoint alone, so that there is no guarantee that each document is summarized from a viewpoint suitable for the required detection. As a result, the document summary displayed as the detection result can be off point from the viewpoint of the user with the specific document detection objective, and in such a case, it is possible for the user to overlook the actually necessary document at a time of judging whether each detected document is the desired document or not.