1. Technical Field
The disclosure relates to an abstract processing method of documents, and in particular, to an automatic abstract determination method of document clustering.
2. Related Art
The quick growth of computers and the Internet makes the amount of information on the Internet increase rapidly. Generally, most of the users receive information through a specific portal website. The information means the articles, news, or on websites, and it may also be called digital documents. Forming the documents are really fast and the mounts of them are very large in recent years because of the widespread usage of digital technologies. Moreover, for refreshing the pages of all kinds of documents in real time, most of the providers of the documents (such as the portal websites) may process and display the abstract of the documents content, for allowing the user to browse more documents at the same time.
The conventional document abstract processing excerpts part of the body contents of the document. As described above, the amount of documents received by the portal website every day are very large. If the abstract processing is done by hand, it's a heavy burden for the manufacturers. Thus the providers of documents usually use automatic abstract processing manners which directly pick up the title or the first few words of the bodies content as the abstract of the document, and show them on the homepage. That is, although the conventional abstract processing manners may increase the number of displayed documents on the same page, the abstracts may simply be generated by capturing parts of the text of the bodies content. The abstracts may not be easily for users to understand the true content of the corresponding documents without determining whether or not the abstracts are close in relation to the key points of the corresponding documents. Therefore, the conventional manners do not match the needs of browsing abstracts with the key contents required by the users.