1. Field of the Invention
The present invention relates to an electronic document processing.
2. Description of Related Art
Conventionally, the World Wide Web (WWW) has been used over the Internet as an application service to serve electronic information in the form of a window.
As well known, the WWW provides a new style of electronic document processing to generate, publicize or commonly use an electronic document. However, it has been demanded that an electronic document can be sorted and summed up or abstracted based on its content, namely, a higher level of electronic document processing than by the WWW has been demanded. For this high level of electronic document processing, it is indispensable to mechanically process the content of an electronic document.
However, such a mechanical processing of the content of an electronic document is difficult for the following reasons. Firstly, the HTML (Hyper Text) Markup Language) prescribes the presentation of an electronic document, but little the content of the electronic document. Secondly, not all electronic networks formed between electronic documents are easily usable by all the readers of the electronic documents to understand the contents of the documents. Thirdly, an author of a document usually writes an electronic document without bearing in mind the convenience of the readers of the electronic document, and the convenience of the readers of an electronic document is not coordinated with that of the author of the electronic document.
As mentioned above, the WWW is a new electronic system, but since it does not mechanically process an electronic document, no high level of electronic document processing can be attained. In other words, it is necessary to mechanically process an electronic document in order to execute the high level of electronic document processing.
For such a mechanical document processing, a system for supporting the mechanical document processing has been developed on the basis of the results of the natural language studies. As a document processing developed through the natural language studies, a mechanical document processing has been proposed which makes the use of tags added to a document on the assumption that attribute information, so-called tags, are additionally used in the internal structure of a document stated by an author.
Incidentally, the user utilizes an information retrieval system such as a search engine, for example, to search for a desired information in vast amounts of information served over the Internet. This information retrieval system searches for the information based on a designated keyword and serves the retrieved information to the user. The user selects desired information from the served information.
Information can thus easily be retrieved, but the user has to read through the information served by the information retrieval system, get the tenor thereof and judge whether it is his or her desired information, which will be very burdensome to the user especially when the amount of information thus served to him is large. To lessen this burden to the user, a system for automatically abstracting or summing up a document information, that is, the document content, a so-called automatic abstract generation system, has recently been attracting much attention in this field of art.
The automatic abstract generation system generates an abstract from an electronic document by reducing the length and complexity of the electronic information while maintaining the original information included in the electronic document, and thus helps the user to get the tenor of the document by reading through the abstract.
Normally, the automatic abstract generation system weights each of sentences and words in a text based on a certain information and places them in their weighted order of significance. The sentences and words placed in higher order are collected together to generate an abstract.
Although the automatic abstract generation system can easily generate an abstract from a document in this way, the amount of information included in an abstract generated by the automatic abstract generation system depends upon the amount of information in the document, method of weighting the sentences and words, etc. For example, when an abstract is too simple for the user to get the tenor of the document, the user cannot refer to any further detailed abstract.