The purpose of information retrieval systems is to enable users to identify documents from a given collection that best match the user's information needs. While existing search engines provide valuable assistance to users in locating relevant information, finding precise information is becoming increasingly difficult. This is especially true for large collections and for interactive systems where users tend to only look at the top k documents where k is small (e.g., 5-20 documents).
Automatic query refinement (AQR) techniques may be used to improve retrieval performance by refining the user's query, such as by adding terms to the query that are related to the original query terms. The goal of the refinement is to cover additional aspects of the information need as specified by the query. The expansion terms may be selected from a thesaurus, a synonym table, or a semantic word network such as Wordnet. Alternatively, AQR may be performed using relevance feedback techniques which draw terms from the top-ranked documents for a given query, based on the assumption that top-ranked documents are of particular relevance.
Recent studies have shown, however, that while AQR improves recall, it often harms precision, particularly among top-ranked documents, especially when applied to very large document collections. Recall is generally improved by AQR since the expansion terms broaden the scope of the query. Specifically, all documents retrieved for the original query are also retrieved for the expanded query, and new documents containing the expansion terms are added to the result set. Precision, however, may deteriorate if the expansion affects the order of the results returned by the search engine, causing non-relevant documents to precede relevant ones.