Suggest (search suggest) is a technique to provide a suggestion prompt according to a search word input by a user. In the Internet, the duty of a searcher is to help the user acquire information to be searched at a faster speed, with less operation and more accurately.
When inputting into a searching box, the user often needs to input many keywords and possibly switch between different input methods. Moreover, the keyword(s) input by the user is possibly subject to input mistakes such as homophones. Finally, the user may possibly not know what keyword(s) should be input to express his or her thoughts very appropriately. Suggest is used to improve the user experience of inputting keyword(s) into the input field. It can shorten user input, correct user input error. More importantly, it can recommend many keywords close to the ideas of the user.
In order to realize “suggest”, it usually needs to go through two steps of phoneticizing process from Chinese character(s) to Pinyin and index searching process. Phoneticizing is to translate Chinese phrase(s) into the corresponding Chinese Pinyin. This process is difficult in processing polyphones, which is usually used only in the case of too few recommended words by index searching directly according to Chinese keyword(s). The index searching of Suggest is usually based on a hashmap (bases on the Map interface of the hashmap), where the performance of the searching process must be good because Suggest service will be celled many times during a user is inputting a keyword.
During the process of phoneticizing, the usual practice to process a polyphone is to enumerate its pronunciations. For example, the Chinese word “ (“music” in English and “yin yue” in Pinyin)” will be translated into “yinyue” and “yinle” in Chinese Pinyin and Chinese word “ (“letv network” in English and “le shi wang” in Pinyin)” will be translated into “yueshiwang” and “leshiwang” in Chinese Pinyin. Such translation bases merely on the pronunciations of each single Chinese character without considering using situation. Thus, it may result in abundant Pinyin index and confusing the correct result. Also it is not beneficial to guide a user to recognize his or her misspelling.
During the course of getting the query string in Chinese from Pinyin, due to improper processing on polyphones, a searching noise may be induced. For example, if a user inputs Chinese Pinyin “yueshi” in Baidu search box whose real meaning is to search information about a lunar eclipse, however, Suggest feedbacks the recommended words including information clearly irrelevant such as Chinese word “” (“letv.com” in English and “le shi wang” in Pinyin) and Chinese word “” (“key” in English and “yaoshi” in Pinyin) though “lunar eclipse” is nearly submerged by the irrelevant information.
For an usual searching method, the larger the data set in a dictionary is, the larger the sub-trees necessary to traverse are. It will result in the increasing of duration consumed in searching as the increase of dataset size, which will affect the user experience.