Information retrieval systems are known in the art. Such systems generally offer users a variety of means of expressing user intentions through queries. These include text search, parametric search, structured queries, selection from alternatives (i.e., browsing or navigation), and range specification. In general, the systems offer users a means of expressing queries using either a structured language (e.g., a language like SQL) or an informal input mechanism (e.g., an English keyword search). When the input mechanism is informal, the problems of ambiguity may arise from the language itself. But, even when the input mechanism is formal, the user may not always succeed in expressing his or her intention in the formal query language.
Information retrieval systems may use a variety of techniques to determine what information seems most relevant to a user's query. For some queries, the choice of technique is not particularly important: for example, if the user enters a query that is the exact title of a document, most techniques will retrieve that document as the most relevant result. For other queries, the choice of technique can be very significant, as different techniques may differ considerably in the results they return. Unfortunately, it is not always clear how to select the best technique for a particular query.
Given the challenges that information retrieval systems encounter in handling ambiguous queries, a variety of techniques have been proposed for estimating or measuring query ambiguity—that is, the likelihood that a particular query formulation or interpretation will provide meaningful results. Recognizing and measuring query ambiguity is a first step to mitigating these problems. The known techniques for estimating or measuring query ambiguity fall primarily into two general categories: query analysis and results analysis. Generally speaking, query analysis techniques focus on the query itself, and consider factors like query length, query term informativeness, and the tightness of relationships among query terms, while results analysis techniques focus on the results for the query, and consider factors like the distinctiveness or coherence of the results, and the robustness of the results in the face of perturbation of the retrieval model. One such technique is the “query clarity” approach of Cronen-Townsend and Croft, which aims to predict query performance by computing the relative entropy between a query language model and the corresponding collection language model.