A search engine generally estimates the extent to which a search result (also referred to as a target string) matches a query string according to a distance of the position where several words in the query string occur among the search result and one with a shorter distance is typically matched to a higher extent and thus is higher ranked. For a query string “disinfection machine”, for example, a search result including “disinfection machine” tends to be closer to the intention of a user than “disinfection equipment”, which is closer still to the intention of the user than “industrial disinfection washing machine”, “dehydrator”, and “dryer”, all of which will have some influence on the ranking of the search results.
Standard techniques for ranking search results are typically based on the distance of words in a query string within a target string using the shortest sliding window (i.e., the shortest interval located in the target string that includes the words in the query string), edit distance of query string and target string, word context as Part of Speech (POS), etc.
These simple techniques tend not to address the issue of correlation between a query search of strings and a target string, and the results often do not accurately reflect the extent to which query search of strings matches the target string. Take a query string “Nokia battery” as an example, three search results A, B, and C include the strings “Nokia battery”, “Nokia cell phone, complimentary battery”, and “Nokia n73 cell phone with original battery”, respectively. The simple distance calculation shows that the distance between with “Nokia” and “battery” in string A is zero and thus A has the highest degree of match. The distances between “Nokia” and “battery” for strings B and C are three and five words, respectively, indicating that C is a poorer match than B. However, based on the user's intent to locate a Nokia battery, search result C in fact is a better match than B despite the greater word spacing.