This application relates generally to the field of information retrieval, and more specifically to the problem of retrieving answers to search queries and to assisting users in revising search queries.
The World Wide Web (web) contains a vast amount of freely available information. However, locating a relevant item of information on the web can be a challenging task. Note that this problem continues to increase as the amount of information available on the web continues to grow.
Search engines can often help users to locate and retrieve a document of interest on the web. However, users often fail to select effective query terms during the searching process. For example, a user may enter the query [web hosting+fort Wayne] when the city of Fort Wayne is usually referred to as Ft. Wayne. Or, a user may enter [free loops for flash movie] when most relevant pages use the term “music,” rather than “loops” and the term “animation” rather than “movie.” Thus, documents that satisfy a user's informational needs may use different terms than the specific query terms chosen by the user to express a concept of interest. Note that this problem becomes more of an issue as the number of terms in a query increases. For queries longer than three or four words, there is a strong likelihood that at least one of the terms is not the best term to describe the user's informational need.
Hence, there is a need to modify and/or expand user queries to include synonyms for query terms, so that retrieved documents will better meet the user's informational needs.
Unfortunately, solving this problem has proven to be a difficult task. A simple approach is to use pre-constructed synonym information, for example from a thesaurus or a structured lexical database. However, thesaurus based systems have various problems. For example, they are often expensive to construct, and are generally restricted to one language.
A more significant issue is that the applicability of a synonym to a given phrase often strongly depends on the context in which the phrase is used. For example, the term “music” is not usually a good synonym for the term “loops,” but it is a good synonym in the context of the example above. However, the context in the example above is sufficiently uncommon that the term “music” is not listed as a synonym for the term “loop” in standard thesauruses. Note that many other examples of contextually dependent non-traditional synonyms can be identified. Hence, even if conventional synonyms can be identified for a term, it may be difficult to identify specific synonyms to use in the context of a specific query.
Other conventional approaches cluster “related words.” Such approaches suffer from the drawback that related words are not necessarily synonyms. For example, the words “sail” and “wind” would likely be clustered (because they co-occur in numerous documents); however they are not synonymous. Hence, substituting one for the other is likely to lead to undesirable search results.
Accordingly, what is needed is a method and an apparatus that identifies potential synonyms, and also identifies contexts in which they are applicable.