The current search technology is generally based on keywords. A user inputs keywords into a search engine for search and the search engine returns web pages containing such keywords. For example, when the user inputs Chinese word “ ” (which means “digital photo camera” in English), the current Chinese search engine will firstly segment the input keyword that generally segment “” into two Chinese terms “” (which means “digital” in English) and “” (which means “photo camera” in English), and then return result web pages containing the two Chinese terms “” and “.”
In fact, because users have different backgrounds and habits, it is very likely that such users have same intentions but use different keywords to search. For example, the users who search Chinese word “” (which means “digital camera” in English) and Chinese word “” have exactly the same intentions. With respect to “” the result web pages returned by the current search engine contain two terms “” and “”. Some very valuable result web pages, however, may not be returned or not ranked at top positions due to the fact that they contain two terms “” and “” (which means “camera” in English). Provided that the search engine can find that “” and “ ” are synonyms and merge and return result web pages that contain both of the two words, it will effectively improve the search accuracy and user experience.
Synonym is a unique phenomenon in natural language. Synonym mining is also a very meaningful work in natural language processing. Its implementation is a great help to rewrite search query and enrich search result so that users can have better search experiences. But replacement of synonyms when applied to the search application must be adequate. It cannot be simply resolved by using a synonym checklist. Because users are accustomed to keywords search and the characters or words same as the search query being highlighted in the results, not every user accepts replacement with different words or characters even if they have exactly the same meaning as the search query. For example, Chinese words “” and “” have exactly the same meaning (both of which mean “potato” in English). But when the user inputs “” while “” is unexpectedly highlighted in the results, he/she might think the search engine got problem. If “” is not highlighted in the results, it is also very easy to be neglected by the user. Therefore, synonym in this disclosure refers to synonym suitable for search application.
The current method for automatic identification of Chinese synonym is as follows: to represent each specific word as a webpage, to establish a linkage relationship of the specific word with other words in the dictionary that are used to interpret the specific word, to assign a score to each such word. Such score represents a similarity between words. In other words, it regards interpretation and being interpreted relationships between different words as a type of hyperlink and page rank score is an index of similarity of semantics between different words, and then identifies synonym according to such similarity of semantics. This method mainly uses page rank score as index to determine synonym. The determination of page rank score relies on the available resources while such resources are quite arbitrary and hard to control. As an example of “potato”, if the available resource emphasizes on the vegetable characterizations and outside appearances, it is very likely that “potato” will establish synonym relationship with “tuber” or “ellipsis.” Therefore, such page rank score representing linkage relationship is very unreliable. Further such unreliability is difficult to be automatically detected, and thus such method cannot accurately identify required synonym so that the identification effectiveness cannot be guaranteed.