With the rapid development of Internet search engine technologies and internet search enterprises, information search systems (or search engines) have become a necessary tool for more and more users of the Internet.
When a person uses a search engine, a common scenario is for the person to input inquiry word, or search term, and obtain a search result through a backend operation of the search engine. The three elements constituting the typical search result include: title, abstract, and URL (or generally referred to as TAU, which is an acronym from the first letter of the three words: title, abstract, and URL). Of the three elements, the abstract generally contains the largest volume of information from a perspective of information volume, and the largest display area from a perspective of webpage display effect, and largely determines whether the search result is correct from a perspective of end user experience such that the user can determine whether the search result is what the user seeks according to information contained in the abstract. Therefore, an abstract generation system that is high-performance, flexible, custom-made, and has excellent human-machine user interface is an indispensable important component of a search engine (or information search system).
A traditional abstract generation method is based on the user's inquiry word to search full-text data at real time, and, based on the result of full-text search, to extract the paragraph best matching the inquiry word as the abstract. This is usually done through calculation of word frequency, word distance, and other parameters by text matching and weighted algorithms. Finally, the traditional abstract generation method presents the search result, including the title, abstract, and URL, to the user as final display.
For information search, a traditional search engine needs to conduct match search in the full-text data. The abstract generation is also based on the full-text data. As the volume of information of the full-text data is huge, long search time and low search efficiency tend to occur as a result.