1. Field of the Invention
The present invention is directed toward the field of information retrieval systems, and more particularly towards generating feedback to a user of an information retrieval system to facilitate the user in re-formulating the query.
2. Art Background
An information retrieval system attempts to match user queries (i.e., the users statement of information needs) to locate information available to the system. In general, the effectiveness of information retrieval systems may be evaluated in terms of many different criteria including execution efficiency, storage efficiency, retrieval effectiveness, etc. Retrieval effectiveness is typically based on document relevance judgments. These relevance judgments are problematic since they are subjective and unreliable. For example, different judgement criteria assigns different relevance values to information retrieved in response to a given query.
There are many ways to measure retrieval effectiveness in information retrieval systems. The most common measures used are "recall" and "precision." Recall is defined as the ratio of relevant documents retrieved for a given query over the number of relevant documents for that query available in the repository of information. Precision is defined as the ratio of the number of relevant documents retrieved over the total number of documents retrieved. Both recall and precision are measured with values ranging between zero and one. An ideal information retrieval system has both recall and precision values equal to one.
One method of evaluating the effectiveness of information retrieval systems involves the use of recall-precision graphs. A recall-precision graph shows that recall and precision are inversely related. Thus, when precision goes up recall typically goes down and vice-versa. Although the goal of information retrieval systems is to maximize precision and recall, most existing information retrieval systems offer a trade-off between these two goals. For certain users, high recall is critical. These users seldom have means to retrieve more relevant information easily. Typically, as a first choice, a user seeking high recall may expand their search by broadening a narrow boolean query or by looking further down a ranked list of retrieved documents. However, this technique typically results in wasted effort because a broad boolean search retrieves too many unrelated documents, and the tail of a ranked list of documents contains documents least likely to be relevant to the query.
Another method to increase recall is for users to modify the original query. However, this process results in a random operation because a user typically has made his/her best effort at the statement of the problem in the original query, and thus is uncertain as to what modifications may be useful to obtain a better result.
For a user seeking high precision and recall, the query process is typically a random iterative process. A user starts the process by issuing the initial query. If the number of documents in the information retrieval system is large (e.g., a few thousand), the hit-list due to the initial query does not represent the exact information the user intended to obtain. Thus, it is not just the non-ideal behavior of information retrieval systems responsible for the poor initial hit-lists, but the user also contributes to degradation of the system by introducing error. User error manifests itself in several ways. One way user error manifests itself is when the user does not know exactly what he/she is looking for, or the user has some idea what he/she is looking for but doesn't have all the information to specify a precise query. An example of this type of error is one who is looking for information on a particular brand of computer but does not remember the brand name. For this example, the user may start by querying for "computers." A second way user error manifests itself is when the user is looking for some information generally interesting to the user but can only relate this interest via a high level concept. An on-line world wide web surfer is an example of such a user. For example, the user may wish to conduct research on recent issues related to "Middle East", but does not know the recent issues to search. For this example, if a user simply does a search on "Middle East", then some documents relevant to the user, which deal with current issues in the "petroleum industry", will not be retrieved. The hierarchical query feedback of the present invention guides users to formulate the correct query in the least number of query iterations as possible.
Another problem in obtaining high recall and precision is that users often input queries that contain terms that do not match the terms used to index the majority of the relevant documents and almost always some of the unretrieved relevant documents (i.e., the unretrieved relevant documents are indexed by a different set of terms than those used in the input query). This problem has long been recognized as a major difficulty in information retrieval systems. See Lancaster, F. W. 1969. "MEDLARS: Reports on the Evaluation of its Operating Efficiency." American documentation, 20(1), 8-36. As is explained fully below, the hierarchical query feedback of the present invention solves the problem of matching user input queries to identify the relevant documents by providing feedback of relevant terms that may be used to reformulate the input query.