With the rapid development of Internet technology, the Internet has become an important channel for people to obtain information. Specifically, a user can enter a query term in a search engine, and the search engine can retrieve a number of webpages in response to the query term for the user to selectively view these webpages. It should be noted that, in order to facilitate the user's viewing, the retrieved webpages can be ranked by the search engine in accordance with the relevance of a webpage to the query term.
The relevance can indicate a similarity between a topic sentence of one retrieved webpage and the query term. For example, for a query term of “symptoms of hepatitis B,” a topic sentence of a retrieved webpage 1 is “what are the symptoms of hepatitis B,” and a topic sentence of a retrieved webpage 2 is “hepatitis B virus transmissions.” Since the topic sentence of the retrieve page 1 is more similar to the query term, the retrieve page 1 is more relevant to the query term and thereby being placed in a front position of the search results. Therefore, the webpage topic sentence can directly affect the ranking order of the retrieve webpages, thereby affecting the user satisfaction with the search results.
Currently, extraction of topic sentences of webpages is based on some manually summarized extraction rules using arbitrary webpages, and these extraction rules are then used to determine a topic sentence of a specific webpage. However, the accuracy of the topic sentences extracted from the webpages by using such extraction methods is relatively low.
The disclosed method and apparatus are directed to solve one or more problems set forth above and other problems.