The present invention relates to method and system for retrieving a document on the basis of a search condition designated by a user and a storage medium storing a processing program for use in the method and system and more particularly, to a retrieval technique in which the user evaluates a document obtained as a result of retrieval and the evaluation is reflected on the search condition.
With the widespread use of personal computers, Internet and the like, the number of electronic documents has been increasing drastically. Under the circumstances, a demand for retrieving a document containing information desired by a user at a high speed and with high efficiency has been increasing.
There is available a technique called relevance feedback as a retrieval technique meeting the demand as above. In this technique, for a search result based on a whole passage search or similar document search, the user inputs an evaluation “the document is desirable” or “the document is undesirable” to the system so that the evaluation information may be reflected to modify the search condition so as to improve the succeeding search results.
Specifically, a process method is shown in, for example, “Information Retrieval” by William B. Frakes/Rocardo Baeza-Yates, Prentice Hall PTR, 1992 pp.241–263, according to which weights in a search condition for words extracted from a document that has been evaluated as desirable by the user add and weights in a search condition for words extracted from a document that has been evaluated as undesirable subtract. Hereinafter, the above technique will be referred to as related art 1.
An example of a concrete method of addition/subtraction of weight applied to a certain word in a search condition is given by equation (1).
                              W          ′                =                  W          +                      α            ⁢                                          ∑                i                P                            ⁢                              F                ⁢                                                                  ⁢                                  p                  ⁡                                      (                    i                    )                                                                                -                      β            ⁢                                          ∑                j                N                            ⁢                              F                ⁢                                                                  ⁢                                  n                  ⁡                                      (                    j                    )                                                                                                          (        1        )            where W′ represents a weight of the word after feedback has been carried out, W represents a weight of the word before the execution of feedback, Fp(i) represents the appearance frequency (times) of the word in an i-th document evaluated as desirable, and Fn(j) represents the appearance frequency of the word in a j-th document evaluated as undesirable. In the equation, P is the number of documents evaluated as desirable, N is the number of documents evaluated as undesirable, and α and β are parameters. The new weight W′ can have a negative value and in such a case, similarity of the document containing the word decreases.
In the related art 1, however, there arises a problem to be described below. More particularly, the related art 1 employs a scheme in which one search condition changes with evaluations by the user, but the related art 1 does not disclose a means for saving a history of changes and a history of user's evaluations by making the correspondence between them. Consequently, once an evaluation made by the user is erased, it is impossible to return the search condition to that at a certain time.
In the relevance feedback, feedback is carried out by selecting a document to be evaluated from restricted documents contained in a set of search results and there is a possibility that the feedback does not always meet an intention of the user. In such an event, the search result is rather degraded by the feedback and it is necessary to erase an evaluation so as to recommence the relevance feedback and approach an intended search result through trial and error. Disadvantageously, it is essentially impossible for the related art 1 to meet the requirements as above.
This will be exemplified by making reference to FIG. 2. The figure shows an instance where a user, who retrieves a document concerning “high-school baseball”, specifies a sample document “A high-school baseball opens following soccer - - - ” and performs a similar document search. At that time, strings such as “high-school”, “baseball”, “soccer” and “opens” are extracted from the sample document and applied with weights to generate a search condition. Then, on the basis of the search condition, scores of individual documents in a database are calculated and a similar document that keeps scores satisfying a predetermined condition is delivered as a similar document search result.
An instance will now be considered in which of documents delivered as search result documents, a document “A professional baseball and a high-school long-distance relay race - - - ” is evaluated as “undesirable” by the user. At that time, strings such as “professional”, “baseball”, “high-school” and “long-distance relay race” are extracted from the document evaluated as “undesirable” and weights of these strings in the search condition subtract, with the result that a document concerning “high-school baseball” cannot thereafter be obtained.
In such an event, the related art 1 does not save a search condition before the document “A professional baseball and a high-school long-distance relay race - - - ” is evaluated as undesirable and consequently, a search condition cannot be returned to that before changing (that is, the search condition before changing cannot be restored) by erasing the evaluation. Accordingly, for example, there arises a problem that when the relevance feedback has been repeated and then an impertinent evaluation is inadvertently inputted, the erroneous evaluation cannot be corrected unless re-designating the initial sample document is recommenced by clearing all the search conditions.
Further, once the search result is degraded owing to changing of search condition, the search condition before the feedback cannot be restored and as a result, a desired document cannot be obtained as a search result. Consequently, a failure to consult the contents arises and in addition, the relevance feedback is carried out such that the document is evaluated as “desirable”, giving rise to a problem that the search result cannot be improved purposively.