Existing search engines generally provide a search keyword recommendation feature. For example, in some web-based engines, the user inputs a search keyword in the search field, and after he or she clicks on a “search” button, the page that the system jumps to not only contains search results, but also includes other search engine-recommended search keywords or search keyword combinations that are related to the search keywords input by the user. Or, as the user inputs search keywords in the search field, recommended search keywords related to the user input search keywords will pop up in the search field's pulldown menu so that a search can be conducted after the user selects a recommended search keyword.
The existing schemes for recommending search keywords are generally based on search logs. Search keywords in the search log that have higher correlations with search keywords input by the user serve as recommended search keywords. The basic principle is as follows:
First, a search log is established. The search keywords contained in a search log typically have the following sources: first, every search keyword entered in search fields by users; second, the search keywords recommended by search engines. The values of the importance parameters corresponding to the search keywords in the search log are also determined. The values of the importance parameters corresponding to the search keywords in the search log are primarily decided by the factors below. Specifically, the value of each search keyword importance parameter can be determined by applying the weighted sum method to all factors:
1. Click factors, i.e., whether or not there is a record indicating that the search keyword search results were clicked by the user, the number of clicks, and click ranks Specifically, if a user uses a search keyword only to conduct a search and does not click on a web page link in the search results, then the click factor parameter value for this search keyword will be relatively low, e.g., the preset parameter value for this factor is 0. If a user uses a search keyword to conduct a search and then clicks on a search result, the click factor parameter value for this search keyword will be relatively high, e.g., the preset parameter value for this factor is the number of times that the search results are clicked. If there is a click record for search keywords, then the more clicks there are, the higher the click parameter value will be for that search keyword, with the result that the weighted sum value of the importance parameter will be higher.
2. Quality factors of the search keyword. Search keyword quality factors include search keyword length, number of semantic terms, and inclusion/non-inclusion of search keywords in a predetermined search keyword set. Search keyword length is the number of characters contained in the search keyword or, specifically, the factor parameter value corresponding to different numbers of characters. For example, the preset factor parameter value corresponding to a search keyword of length 2 is 1, the preset factor parameter value corresponding to a search keyword of length 3 is 0.8, the preset factor parameter value corresponding to a search keyword of length 4 is 0.5, and so on. The “number of semantic terms” refers to the following: after the search keyword undergoes word segmentation processing, the quantity of semantic terms obtained thereby is compared to preset threshold values. The semantic term factor parameter value corresponding to the search keyword is determined on the basis of the comparison result. Examples of predetermined search keyword sets are prohibited word sets, product brand word sets, and special commercial intention word sets. The parameter value corresponding to the search keyword is determined for this factor on the basis of whether or not the compared search keywords contain sample words in these predetermined search keyword sets.
When recommending search keywords, the search engine, after receiving search keywords input in the search field by a user, typically performs the following steps with respect to each search keyword contained in the search log:
1. Determine the similarity values between the search keywords contained in the search log and the input search keywords. There are many specific methods whereby the similarity between two search keywords can be determined. For example, a method based on the longest common substring of the two search keywords may be employed. The recommended values for the search keywords contained in the search log can then be determined by using the weighted sum method on the determined similarity values and importance parameter values of the search keywords contained in the search log.
2. In accordance with a high-to-low sequence for the corresponding recommendation values, put the search keywords contained in the search log into sequence, and select the first N search keywords as the search keywords recommended to the user.
The advantage of the search log-based scheme for recommending search keywords described above lies in the ability to guide, in step-by-step fashion, users who have a clearly defined intention to complete or revise the search process. For example, the search keyword input by a user is “mobile phone.” The first search keyword recommended according to the above-described search log-based scheme for recommending search keywords is “S brand mobile phone.” If the user clicks on the recommended search keyword “S brand mobile phone” to conduct a further search, such an action is the equivalent of the currently input search keyword being “S brand mobile phone.” The second search keyword recommended according to the above-described search log-based scheme for recommending search keywords is “S brand smart mobile phone,” and so on.
However, in the case of users who do not have an obvious search intention, as for example a user who inputs “distributor franchise” as the search keyword, the recommendation scheme involving continual refinement as described above cannot easily meet the requirements. The recommended search keywords are often on the whole semantically the same as the input search keywords; they are alternatively words formed after adding other determiners to the search keywords. Moreover, the recommended search keywords often are limited to a specific field. The search keyword recommendations are often ineffective for such search terms. That is, recommended search keywords are rarely clicked by the user. In addition, when the search engine server performs keyword-related recommendations, it needs to obtain a search log, perform similarity value calculations, and perform operations such as sequencing. System resources of the search engine server are consequently occupied, and yet the associated keywords that are recommended fail to meet the needs of the user. Therefore, this approach wastes the system resources of search engine servers and diminishes their processing efficiency.