Embodiments of the present invention generally relate to the field of information retrieval, and more specifically to the task of identifying valid synonyms for query terms to facilitate retrieving documents which relate to the query terms.
The relentless growth of the Internet makes locating relevant information on the World Wide Web (the Web) an increasingly challenging task. While search engines can help users locate and retrieve a document of interest on the Web, users often fail to select effective query terms during the search. The problem of finding desired query results becomes increasingly challenging as the amount of information available on the Web continues to grow.
For example, a user may enter the query [Web hosting+fort wayne] when the city of Fort Wayne is usually referred to as Ft. Wayne. A user may also enter [free loops for flash movie] when most relevant pages use the term “music” rather than “loops” and the term “animation” rather than “movie.” Thus, documents that satisfy a user's informational needs may use different terms than the specific query terms chosen by the user. This problem is further aggravated as the number of terms in a query increases. For queries longer than three or four terms, there is a strong likelihood that at least one of the terms is not the best term to describe the user's intended search. It is therefore desirable for a search engine to automatically modify and/or expand user queries to include synonyms for query terms, so that retrieved documents can better meet the user's informational needs.
This task has proven to be difficult. A simple approach is to use pre-constructed synonym information, for example, from a thesaurus or a structured lexical database. However, thesaurus-based systems have various problems, such as being costly to construct and being restricted to one language.
Accordingly, what is needed is a method and an apparatus for identifying potential synonyms without the above-described problems.