A search engine crawls web pages on the Internet. When a user queries a search engine, the search engine finds all the web pages that are related to the user's keyword and then ranks the web pages in the order of their relevance degrees, starting with the most relevant web pages (i.e. web pages with the highest relevance degrees) first. Because the relevance degree of a web page is very complex and is calculated based on many parameters, there are various technical solutions for calculating the relevance degrees of web pages. Moreover, different search engine suppliers use different parameters and methods to calculate the relevance degrees of web pages.
For example, in 1997, Google proposed Page Rank, a parameter for improving the determination of relevance degrees of web pages. Page Rank may be understood as follows: a target web page linked from an important web page may obtain an important weight value; and the more important the web pages from which a target web page is linked, the higher Page Rank the target web page has, and thus the more important the target web page is deemed to be.
For the existing search engines, some problems often arise in the area of ranking search results with respect to content-type query words. Existing search engines generally discount the influence of web pages with the same or approximately the same content by using de-duplicate technology. For example, as a result of de-duplicate technology, some web pages with duplicated (i.e. reproduced) content may not be stored, not be displayed, or simply ranked low in the returned search results for a user's search query. According to the Page Rank method alone, if there is no appropriate link data, then the search engines may either ignore the original web page or rank it low but rank a web page with reproduced data much higher. Therefore, for existing search engines, the presence of different web pages with the same content have little or no influence on the relevance degree ranking of web pages.