Query suggestion helps users of a search engine to better specify their information need by narrowing down or expanding the scope of the search with synonymous queries and relevant queries, or by suggesting related queries that have been frequently used by other users. Search engines, such as Google, Yahoo!, MSN, Ask Jeeves, all have implemented query suggestion functionality as a valuable addition to their core search method. In addition, the same technology has been leveraged to recommend bidding terms to online advertiser in the pay-for-performance search market.
Typical methods for query suggestion perform monolingual query suggestion. These methods exploit query logs (of the original query language) and document collections, assuming that in the same period of time, many users share the same or similar interests, which can be expressed in different manners. By suggesting the related and frequently used formulations, it is hoped that the new query can cover more relevant documents.
The existing techniques for cross-lingual query suggestion are primitive and limited. These techniques approach the issue as a query translation problem. That is, these techniques suggest queries that are translations of the original query. When used as a means for cross-lingual information retrieval (CLIR), for example, the system may perform a query translation followed by a monolingual information retrieval (IR) using the translation of the origin of query as the search query. Typically, queries are translated either using a bilingual dictionary, some machine translation software, or a parallel corpus. In other query translation methods, out-of-vocabulary (OOV) term translations are mined from the Web using a search engine to alleviate the problem of OOV, which is one of the major bottlenecks for CLIR. In others, bilingual knowledge is acquired based on anchor text analysis. In addition, word co-occurrence statistics in the target language has been leveraged for translation disambiguation.
Many of these translation techniques rely on static knowledge and data and therefore cannot effectively reflect the quickly shifting interests of Web users. For those translation approaches may help reduce the problem of static knowledge, they have other inherent problems existing with any cross-lingual query suggestion (CLQS) model that simply suggest straight translations of the queries. For instance, a translated term may be a reasonable translation, but it may not be popularly used in the target language. For example, the French query “aliment biologique” is translated into “biologic food” by Google translation tool, yet the correct formulation nowadays should be “organic food”. Therefore, there exist many mismatches between the translated terms and the terms in the target language. These mismatches make the suggested terms in the target language ineffective.
Furthermore, it is arguable that accurate query translation may not be necessary for CLQS. Indeed, in many cases, it is helpful to introduce words even if they are not direct translations of any query word, but are closely related to the meaning of the query. This observation has led to the development of cross-lingual query expansion (CLQE) techniques, some of which reported the enhancement on CLIR by post-translation expansion, and others developed a cross-lingual relevancy model by leveraging the cross-lingual co-occurrence statistics in parallel texts. However, query expansion cannot be used as a substitute for query suggestion. Although query expansion is related to query suggestion, there is an essential difference between them. While expansion aims to extend the original query with new search terms to narrow the scope of the search, query suggestion aims to suggest full queries that have been formulated by users so that the query integrity and coherence and preserved the suggested queries.
Furthermore, there is a lack of a unified framework to combine the wide spectrum of resources and recent advances of mining techniques.