In a search process by a search engine, search results may be secondly ranked according to some attributes (such as geography, source, or subject) so that the top n (n>=1) search results present diversity of distribution in terms of those attributes. This is referred to as diversification of search results. In the context of e-commerce search, search results are often ranked according to relevance or time. Thus a supplier would continuously publish information of a given product so this product can occupy top pages of the search results, thereby maliciously depriving the product display opportunity of one or more other suppliers and causing certain troubles to general users who may be attempting to find other products.
To avoid such problem, the current technologies provide a search method to extract and categorize search results based on relevance. The detailed implementation process is as follows: search results are pre-categorized based on relevance, search results with similar relevance scores are classified into a same category, and search results from each category are then extracted. The extraction includes: selecting a field as a basis for diversity, such as uid (a unique identification of supplier) for example. Then the search results would include the products from a diversity of suppliers. In practice, the search results are classified into many sub-sets according to uid score. The search results for the same uid are classified into a same sub-set, and are ranked according to their relevance scores from high to low in the same sub-set. The m (m>=1) most relevant search results in each sub-set are extracted and displayed at top several pages of the search results. Therefore, the search results in the top several pages can include products from different uids, or suppliers.
The above-described process based on the current technologies requires the classification of sub-sets and ranking according to uid. Although such process can implement diversification of search results to a certain extent, the current technologies need to re-organize all of the search results in the extraction and classification process. This requires copying of the search results in the system memory and thus consumes a large volume of resources at the search engine server, such as time and expenditure of hardware systems. This would cause low performance of the search engine. Further, the ranking in each sub-set is in fact not completely necessary. Thus the current technologies also conduct some calculations that may be unnecessary and waste the system resources for such calculation. In addition, although the current technologies use the classification based on relevance to balance the diversity and relevance of the search results to a certain extent, the current technologies cannot use a fixed classification interval to correctly classify all search results. As shown in the FIG. 1, an interval classification may be proper for a query A, and may be improper for a query B. It shows that the search results with similar relativities are classified into the same interval for the query A. However, the search results with similar relevance are not regularly classified into the same interval for the query B. Thus the current technologies lack flexibility.
In general, a pending challenge before one of ordinary person in the prior art is to creatively submit a search method to resolve the problem of over-consuming server resources under the current technologies.