The present invention relates to a document retrieval method for sequencing documents according to the goodness of fit to the retrieval condition, and issuing the retrieval results according to this sequence, a recording medium in which its program is recorded, and a document retrieval apparatus, and more particularly to a document retrieval method capable of judging the relation between the retrieval condition and retrieval result easily, a recording medium in which its program is recorded, and a document retrieval apparatus.
Recently, as a huge quantity of electronic document information has begun to circulate, such as electronic mail, electronic catalog and electronic publication, there is a mounting interest about document retrieval method and document retrieval apparatus capable of retrieving only a desired document among electronic document information.
As the document retrieval method and document retrieval apparatus for retrieving only a desired document, much has been proposed so far about the technique of document retrieval by sequencing the results of retrieval by making use of information of frequency of occurrence of characters or symbols (or words as called hereinafter). In a conventional document retrieval method by making use of information of occurrence of words, the evaluation value is set higher in words occurring often in a certain document, and the evaluation value is set lower in words not occurring in other documents, and the documents are sequence according to such index.
For example, in a conventional document retrieval method, as a standard index for calculating the word evaluation value ev, the following formula is used.
ev=log(N/df)xe2x80x83xe2x80x83[Formula 1]
where N is the total number of documents, and df is the number of documents in which the word of notice (the word to be retrieved or retrieval word) occurs.
In this case, for example, if the total number of documents N is 1000, and the number of documents having the retrieval word X is 10, the evaluation value evx of the retrieval word X is evx=log (1000/10)=2.0, and if the number of documents having the retrieval word Y is 100, the evaluation value evy of the retrieval word Y is evy=log (1000/100)=1.0.
The evaluation value E of each document is, for a set of all retrieval words, the sum of the product of an evaluation value e of a certain retrieval word and the frequency of the retrieval word in the document (frequency of occurrence). That is, supposing the frequency of occurrence of a certain retrieval word in document to be tf, the evaluation value Ev of the retrieval word in document is expressed in the following formula.
Ev=xcexa3{tfxc3x97ev}=xcexa3{tfxc3x97log(N/df)}xe2x80x83xe2x80x83[Formula 2]
For example, evaluation values EvA and EvB in document A and document B about retrieval word X and retrieval word Y are calculated as follows. First, the frequency of occurrence tf of retrieval word X and retrieval word Y in document A and document B is determined. Herein, in document A, the frequencies of occurrence tfAX and tfAY of retrieval word X and retrieval word Y are respectively tfAX=10 and tfAY=5, and in document B, the frequencies of occurrence tfBX and tfBY of retrieval word X and retrieval word Y are respectively tfBX=5 and tfBY=10. In this case, from formula 2, the evaluation value EA of document A and evaluation value EB of document B are calculated as follows respectively.
EvA=10xc3x972.0+5xc3x971.0=25.0
EvB=5xc3x972.0+10xc3x971.0=20.0xe2x80x83xe2x80x83[Formula 3]
Thus, in the conventional document retrieval method, mostly, the word occurring in the retrieval condition is used as the word of notice (retrieval word) when calculating the evaluation value Ev of document. That is, according to the conventional document retrieval method, the retrieval results of documents are sequenced on the basis of the evaluation value Ev of each document obtained in this manner.
However, in the conventional document retrieval method, since the document retrieval results are sequenced by integrating the information about the frequency of occurrence of retrieval word in the retrieval condition, it is hard to distinguish the individual effects of each retrieval word in the document retrieval results.
In particular, if the retrieval result conforming to the purpose of retrieval is not obtained, it is necessary to retrieve again by revising the retrieval condition (retrieval word, etc.). At this time, it was hard to understand for the user how the effect of such revision is utilized in the sequencing of the retrieval results.
The invention is devised in the light of the above background, and it is hence an object thereof to present a document retrieval method allowing the user to judge easily the validity of retrieval condition such as retrieval word and effects of retrieval condition on the retrieval result, so that the user can improve the efficiency of retrieval process, and a recording medium in which its program is recorded, and a document retrieval apparatus.
To solve the problems, a first aspect of a document retrieval method of the invention is a document retrieval method for retrieving a set of documents composed of plural documents according to an entered retrieval condition, comprising the steps of retrieving each document included in the set of retrieval object documents according to the entered retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, designating a specific set of sample documents and a specific sample document included in the specific set of sample documents, detecting the sequence by occurrence in the retrieval result in each designated specific sample document according to the retrieval result, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the sequence by occurrence in each specific sample document.
To solve the problems, a second aspect of a document retrieval method of the invention comprises the steps of retrieving each document included in the set of retrieval object documents according to the entered retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, generating a table of sets of sample documents designating the relation of a specific set of sample documents and a specific sample document according to the acquired retrieval result, detecting the sequence by occurrence in the retrieval result in each specific sample document designated in the table of sets of sample documents according to the retrieval result, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the sequence by occurrence in each specific sample document.
To solve the problems, a third aspect of a document retrieval method of the invention comprises the steps of retrieving each document included in the set of retrieval object documents according to the entered retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, subdividing the entered retrieval condition, and generating a divided retrieval condition by arbitrarily combining the retrieval conditions in the subdivided units, designating a specific set of sample documents and a specific sample document included in the specific set of sample documents according to the divided retrieval condition and the retrieval result, detecting the sequence by occurrence in the retrieval result in each designated specific sample document according to the retrieval result, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the sequence by occurrence in each specific sample document.
To solve the problems, a fourth aspect of a document retrieval method of the invention comprises the steps of retrieving each document included in the set of retrieval object documents according to the entered retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, preparing an attribute condition for specifying a document in a specific range, designating a specific set of sample documents and a specific sample document included in the specific set of sample documents according to the attribute condition and the retrieval result, detecting the sequence by occurrence in the retrieval result in each designated specific sample document according to the retrieval result, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the sequence by occurrence in each specific sample document.
A fifth aspect of a document retrieval method of the invention relates to the first to fourth aspects of the document retrieval method of the invention, in which the retrieval condition to be entered is stored preliminarily.
Each document retrieval method described above may be compiled as a program for executing in the computer, and the program can be recorded in a recording medium that can be read by the computer.
To solve the problems, a first aspect of a document retrieval apparatus of the invention is a document retrieval apparatus for retrieving a set of documents composed of plural documents according to a retrieval condition, comprising document retrieving means for retrieving each document included in the set of retrieval object documents according to the retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, memory means of table of sets of sample documents for storing the table of sets of sample documents designating the relation between a specific set of sample documents and a specific sample document, and calculating means of occurrence distribution for detecting the frequency of occurrence of the retrieval condition in each specific sample document designated by the table of sets of sample documents stored in the memory means of table of sets of sample documents according to the retrieval result acquired by the document retrieving means, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the sequence by occurrence in each specific sample document.
To solve the problems, a second aspect of a document retrieval apparatus of the invention comprises document retrieving means for retrieving each document included in the set of retrieval object documents according to the retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, generating means of table of sets of sample documents for generating the table of sets of sample documents designating the relation between a specific set of sample documents and a specific sample document, and calculating means of occurrence distribution for detecting the frequency of occurrence of the retrieval condition in each specific sample document designated in the table of sets of sample documents generated by the generating means of table of sets of sample documents according to the retrieval result acquired by the document retrieving means, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the sequence by occurrence in each specific sample document.
Herein, this generating means of table of sets of documents may also generate a table of sets of sample documents specifying the relation between the specific set of sample documents and the specific sample document, according to the retrieval result calculated by the document retrieval means.
To solve the problems, a third aspect of a document retrieval apparatus of the invention comprises document retrieving means for retrieving each document included in the set of retrieval object documents according to the retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, divided retrieval condition generating means for subdividing the retrieval condition into retrieval conditions of specific units, and generating divided retrieval conditions by arbitrarily combining the subdivided retrieval conditions in specific units, generating means of table of sets of sample documents for generating a table of sets of sample documents designating the relation of a specific set of sample documents and a specific sample document, according to the divided retrieval conditions generated in the divided retrieval condition generating means and retrieval result obtained by the document retrieving means, and calculating means of occurrence distribution for detecting the sequence by occurrence in the retrieval result in each specific sample document of the table of sets of sample documents generated by the generating means of table of sets of sample documents according to the retrieval result acquired by the document retrieving means, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the frequency of occurrence in each specific sample document.
To solve the problems, a fourth aspect of a document retrieval apparatus of the invention comprises document retrieving means for retrieving each document included in the set of retrieval object documents according to the retrieval condition, sequencing each document depending on the goodness of fit to the retrieval condition, and acquiring the retrieval result by shuffling the documents in the sequence by occurrence, memory means for storing an attribute condition for specifying a document in a specific range, generating means of table of sets of sample documents for generating a table of sets of sample documents designating the relation of a specific set of sample documents and a specific sample document, according to the attribute condition stored in the memory means and the retrieval result obtained by the document retrieving means, and calculating means of occurrence distribution for detecting the sequence by occurrence in the retrieval result in each specific sample document designated by the table of sets of sample documents generated by the generating means of table of sets of sample documents according to the retrieval result acquired by the document retrieving means, and calculating the occurrence distribution of the retrieval condition relating to the set of sample documents including the specific sample document according to the sequence by occurrence in each specific sample document.
A fifth aspect of a document retrieval apparatus of the invention relates to the first to fourth aspects of the document retrieval apparatus of the invention, which further comprises retrieval condition memory means for storing plural retrieval conditions, and retrieval condition acquiring means for acquiring one or plural specific retrieval conditions from the plural retrieval conditions stored in the retrieval condition memory means, in which the retrieval condition acquiring means enters the acquired specific retrieval condition into the document retrieving means and generating mans of divided retrieval condition at a specific timing.
In these aspects of the invention, from the retrieval result obtained in the retrieval condition, the occurrence distribution of the retrieval condition relating to the set of sample documents representing the intent of retrieval can be expressed. Accordingly, when retrieved by using different retrieval conditions, by comparing the occurrence distribution of the retrieval condition relating to the set of sample documents, the goodness of fit to the retrieval condition in the set of sample documents, and the effects on the retrieval result in each retrieval condition can be easily judged.
According to the sequenced retrieval results, the set of sample documents can be generated, revised or deleted, and therefore when retrieved by using different retrieval conditions, by comparing the occurrence distribution of the retrieval condition of plural sets of sample documents, the effects on the retrieval result in each retrieval condition can be easily judged.
Further, by subdividing the retrieval condition into plural retrieval conditions and generating plural sets of sample documents in each subdivided retrieval condition, the effects on the retrieval result in each retrieval condition can be easily predicted.
Plural sets of sample documents are generated on the basis of other attributes than the contents of the document relating to the document in the retrieval result, for example, in the case of patent application specifications, on the basis of the international patent classification or filed or laid-open date, so that the retrieval results can be easily investigated from plural viewpoints.
The user can calculate the occurrence distribution of retrieval conditions relating to the set of sample documents in each one of plural retrieval conditions, and therefore when the user searches the same retrieval object repeatedly, the retrieval efficiency is enhanced. Moreover, the user can compare the plural retrieval conditions and occurrence distribution of retrieval conditions.
The document retrieval apparatus of the invention generates the occurrence distribution of retrieval conditions in each set of documents or each retrieval condition in the table of sets of sample documents from the retrieval history of retrieval condition, retrieval result, or occurrence distribution of retrieval conditions, so that the user can judge and predict more easily the effects on the retrieval result in each retrieval condition.