Data on one or more computer systems may contain data useful for a user. However, the data may be too large for the user to find the data by direct examination. Additionally, some parts of the data repository may contain information that is not accessible to the user. In many cases, in order to allow the user useful access to the data, a search mechanism is provided. The search mechanism allows a user to issue a search request (also termed a search query). The results are then returned for the user.
For example, a web-based search engine is a search mechanism which may be used to provide search access to information via a web-based search. The information may be a specific data repository, such as a database or other data collection. The information may also be an agglomeration of a number of different data repositories. Such a search engine may provide search access to information available from different information providers over a network, such as the Internet.
In a typical usage of a web search engine, the user enters a query, which is a set of search terms related to what the user is looking for. The query is transmitted to the search engine, which attempts to locate “hits”—i.e., content that is available on the Internet and that relates to the terms contained in the query. Generally, the search engine either has a database of web pages that are known to exist, or communicates with external “providers” who maintain such databases; the query is “scored” against items in these databases to identify the content that best match the query. A list of results is then generated, and these results are returned to the user's computer for display by the user's web browser.
Typically, the search results contain information such as: the Uniform Resource Locators (URLs) of web pages, the titles of the pages, descriptions of the pages, and possibly other textual or graphical information about the web pages. The user then reads the results and attempts to determine, based on the description contained in the results, whether the results correspond to what the user is looking for. Users may then attempt to retrieve the entire page correlating to a search result. In other contexts, search engines present results summarizing the pieces of data which may be useful for a user.
The utility of the search engine is directly correlated to the quality of the results provided. In the best case, results are presented to the user in order of utility to the user. Because the quality of the results is subjective, the user's satisfaction must be determined in order to determine whether the results were satisfactory.
Generally, search engines in the prior art use non-scalable methods for evaluating the quality of search results. As an example, a human reviewer may examine a record of a search and the search results to determine whether the search results are satisfactory. However, this presents at least three major problems. First, as noted, this method is non-scalable with respect to the number of judgments provided for individual queries. While 300 results may be judged by a reviewer, it is hard to generalize the satisfactoriness of 300 judged results to over 3,000,000 results.
Second, the method is non-scalable with respect to the number of unique queries that can be judged. A search engine may perform in an unsatisfactory way on searches of a specific type or with a given characteristic. If only a small subset of the all searches performed are judged, such a problem may be difficult to diagnose. A number of queries of the certain type for which the search results are not satisfactory may be needed in order to recognize or diagnose a problem; otherwise a few queries for which search results are unsatisfactory may appear only as outliers. Thus, where only a small number of queries judged, a sufficient accumulation of such unsatisfactory queries may never be gathered.
A last problem is that the opinion of judges on user satisfaction may not be equivalent to the opinion of actual users on their satisfaction. The population of judges may be a different population than the target population of users. Thus, substituting the opinion of judges for the opinion of actual users may not result in a correct assessment of satisfaction.
In the prior art, the quality of search results has been evaluated by asking users to provide feedback about the appropriateness of one or more results in an interactive fashion, so called relevance feedback techniques. Gerard Salton and Chris Buckley. “Improving information retrieval performance by relevance feedback”, Journal of the American Society for Information Science, 1990, 288-297. Relevance feedback techniques require that users explicitly provide feedback, for example, by marking results as to their degree of relevance, by selecting keywords to add to the query, or by answering follow-up questions about their search intent. User feedback data is then typically used to automatically modify the user's query, thus initiating a new search and a new list of search results. Explicit feedback is typically collected on limited scale. Users need to opt-in to providing feedback, so the sample of users is biased. In addition, explicit feedback techniques require that users engage in activities beyond their intended searching behavior, and this may influence the search outcome. Finally, since the costs to the user are high, and the benefits not immediately obvious, it can be difficult to collect data in a reliable fashion from a large, representative sample of users.
In the prior art, quality of individual web pages has been measured by obtaining explicit feedback from a user. At least one prior art web browser has attempted to obtain such explicit feedback from a user. This browser is described in a paper entitled “Inferring User Interest” by Mark Claypool, David Brown, Phong Le, Makoto Waseda in IEEE Internet Computing 5(6): 32-39 (2001). In this browser, different pages are displayed by the browser. Whenever the page being displayed by the browser is changed, a user evaluation of the page is requested from the user. User evaluations for a given page are collected, to determine whether users find that page valuable. In this browser, some implicit feedback is also maintained regarding each page, including data regarding the time spent on the page, mouse movements, mouse clicks, and scrolling time.
While this technique does gather user feedback, it has limited utility in situations in which users may have different needs for a page. For example, a user who is looking for information about books written by Douglas Adams may evaluate a page on his book The Hitchhiker's Guide to the Galaxy and give a high score for utility. However, another user who is looking for information on books about traveling cheaply may evaluate the same page and give it a low score. Thus the technique described will have limited utility in the wide variety of situations in which different users may have different needs, or even where a single user may have different needs for information at different times. In other words, the usefulness of this technique is limited because evaluation of each page is completely independent of the context in which the user arrived at the page.
Thus, this technique is not useful for evaluating the quality of a search engine. In general, this technique is not useful for evaluations which are context-based, but only for evaluating the quality of individual data items, independent of the context in which a user arrived at the data items.
The gathering of context-based user feedback has been accomplished for searches performed on a search mechanism. The search mechanism is monitored for user behavior data regarding an interaction of a user with the search mechanism. The response data provided by the search mechanism is also monitored. Context data (describing the search) and user feedback data (the user's feedback on the search—either explicit or implicit) are stored. However, while such data has been gathered, the raw data does not contain explicit user satisfaction data which can replace the judged user satisfaction data from a reviewer, which judged data suffers from the drawbacks described above.
In view of the foregoing, there is a need for a system and method that overcomes the drawbacks of the prior art.