Traditional models for information retrieval (IR), such as the Vector Space Model and Language Models for IR, for example, tend to be based on term matching. More particularly, such models identify information in response to a query by calculating the relevance of a document with respect to the query based on the terms and/or words shared by the query and the document. However, since the user who submits the query and the author(s) of the document often use different terms and/or words to describe the same or similar concepts, various IR methods may suffer from term mismatch. That is, documents that are otherwise responsive to the query may not be identified due to differences in expression, including typographical errors, the use of acronyms and/or synonyms, etc. As a result, because one or more documents and a query are relevant but do not share any term, documents that are relevant and responsive to the submitted query may not be returned to the user.
Term mismatch may also occur in web search. For instance, there may be tens, if not hundreds, of different queries that represent a single search intent, such as “things to do in New York.” Accordingly, a particular query entered by the user may return some, but not all, of the documents relating to this search intent. Query expansion, by expanding the scope of a search by adding terms to a query, may be effective for conventional IR methods. Since various IR methods retrieve documents containing any one of the query terms, adding new terms to a query may result in additional documents being retrieved. However, in web search, adding terms to a search query may cause a search engine to only return documents or websites that contain each of the terms included in the query. Therefore, query expansion in the context of web search may actually cause the search engine to retrieve fewer documents, which would not improve the relevance of the search results.