In conventional keyword-based search, relevance is often determined by keyword matching between queries and candidate documents. However, as is well known, keyword matching alone often cannot produce accurate results. One example of such a drawback is with synonyms. For example, if a query is about “car”, documents containing information about cars in which the author only uses the word “auto” would probably not be considered as relevant to the query.
Another problem with conventional search is with similar document search. For example, in patent search, either for prior art search or infringement search, using simple query strings is not an effective way to find certain potentially related or similar patents. For this purpose, a comparison of relevancy between two or more patents or documents in general is needed. However, even when a search engine allows for document-based search, the conventional keyword-match method still cannot produce optimal results due to the problem similar to using synonyms and the alike. Using a thesaurus is one solution, but it is still limited by the quality and scope of the thesaurus, which itself is word-based rather than concept-based.
Another example of the keyword-matching problem is with the so-called context-based advertising. An example at the current time is the prevailing Internet advertising method such as Google AdWords or AdSense. In essence, both methods require the advertiser to pre-define target keywords as the context for their advertisement, and what the search provider does is to match these target keywords to user queries or in the content of a website that is willing to display ads. While matching keywords provides a certain amount of context information, the effect of the advertisement is still limited because the relevance between a keyword and an advertisement is often not fully determined by the keywords. Better results can be achieved if the context is conceptually based. For example, if the query contains such words as “San Francisco hotels”, ads from hotels in the SF area may be displayed. However, if the query contains such words as “stay in San Francisco”, or “stay near Golden Gate Bridge”, and if the hotel advertiser does not pre-define words such as “stay”, etc., as relevant, their ads will not be displayed, even though they can be highly relevant to the context.