As search technology and users' search requests get sophisticated, state-of-the-art search engines are oftentimes faced with challenges to search in accordance with complicated search queries that include many key words. The search engines can only deliver a low success rate, when the search queries are directly used for search. Rather, the search engines may choose to segment a search query to some key words, and search with the segmented key words. The search engines then combine search results of the segmented key words to obtain a list of search results for the search query that includes the segmented words.
For example, a search query inputted in Chinese is segmented based on a statistics-based machine learning method. This machine learning method specifically includes the following steps: (1) collecting a set of texts from publically issued data sources, such as a media source; (2) manually selecting and segmenting a subset of the text set; (3) obtaining segmentation rules by statistically analyzing the results of the manual text segmentation; and (4) segmenting the search query inputted in Chinese to a set of key words according to the statistics-based learning segmentation rules.
Despite its acceptable performance for segmenting search queries, the above statistics-based machine learning method demands a huge amount of computational resources and computational time. The accuracy of query segmentation highly relies on the results of manual text segmentation. Errors in the manual query segmentation results propagate to the segmentation rules and subsequent search query segmentation. Moreover, the statistically based machine-learning method does not recognize new key words that have not appeared in manual text segmentation, and therefore, the error rate increases for search queries that involve many specialized key words.