By introducing a long-term interest and a short-term intention of a user as factors for recalling and ranking search results, a personalized search technology can improve the accuracy of predicting a real intention of the user, such that a search result better meets a requirement of the user. An existing personalized search method is mainly implemented by rearranging top-n natural search results (that is, search results obtained based on a search sequence submitted by the user) in a personalized fashion, which has many limitations in an actual application.
The existing technology has the following problems:
Recalling has significant limitation. The main purpose of rearranging natural search results based on user's interest is to emphasize the results conforming to the user's interest while ensuring the correlation. The foregoing method is effective when natural search results can fully reflect the diversity of requirements. However, because feedback of group users, such as click through users, is taken into consideration, natural search results usually can only reflect requirements of popular groups, and can hardly cover long-tail requirements that account for a greater proportion. In addition, in order to ensure a search response time, generally only few front results are intercepted for rearrangement. Therefore, requirements of a considerable user population cannot be met due to a lack of resources.
Auxiliary information for personalized arrangement needs to be added to natural search results. A major operation of the personalized arrangement is to calculate a degree of coincidence between search results and interests of users, thus assigning a rearrangement weight to each search result. To implement this operation, related characteristics, such as an interest subject characteristic, generally need to be extracted for each search result and each user. In order to extract these characteristics, on one hand, relatively abundant data is required, for example, behavior data of users and content description data of search results, and on the other hand, expensive calculation and storage costs are needed for large-scale data. In an application scenario, such as a picture search, where content description data of search results is not abundant and there are an excessive number of search results, it is relatively difficult to meet the foregoing two requirements at the same time.
A subject-classification-based user interest model cannot completely meet the actual application requirement. In order to describe personalized requirements of users, an existing system generally may employ a manual or machine learning method to establish a subject model, and map long-term or short-term interests of the users and search results to the same subject model, thus implementing calculation of interest similarity between the users and the search results. Despite the high quality, a subject class (such as an Open Directory Project) established manually is high in construction and update costs and poor in the interdisciplinary migration capability. Automatic text subject classification carried out using a machine learning algorithm (such as LDA) has problems such as low accuracy and an undesirable effect on a short text.