1. Field of the Invention
The present invention relates to a document data generating apparatus for generating/processing electronic document data.
2. Description of the Related Art
WWW (World Wide Web) is widely used to supply hypertext information via the Internet.
The WWW is a system that allows electronic document to be treated in a new manner, that is, generated, processed, disclosed, and used in common. However, from the point of view of practically using documents, the WWW has a limitation in the capability of processing documents. Thus, there is a need for a higher-level document processing technique such as categorization or summarization of documents. In order to realize such high-level document processing, it is necessary to automatically process the contents of documents.
However, such automatic processing of the contents of documents has difficulties as described below.
Firstly, the HTML (Electronic Markup Language) prescribes the manner of representing documents, but does not prescribe the contents of the documents. Secondly, it is not necessarily easy for users to understand the contents of documents that are linked to one another via a hypertext network. Thirdly, authors usually write documents without bearing in mind the convenience of readers, and no adjustment is made as to the difference in convenience between authors and readers.
Although the WWW is a new electronic documentation system having various advantages, the WWW is not capable of performing high-level document processing which needs additional automatic processing. In other words, in order to realize the high-level document processing, it is required to automatically process documents.
To the above end, systems for assisting in automatically processing a document has been developed on the basis of natural language processing technology. One such method is to automatically process a document according to tags which have been attached, by the author of the document or other person, to the document so as to represent attribute information about the internal structure of the document.
In recent years, computers have become increasingly popular, and many computers are connected to one another via a network. As a result, there occurs a need for a higher-level document processing technique to perform generation of a text document, labeling, and a modification of a text document, in accordance with an index depending upon the content of a document. More specifically, there is a need for a technique to summarize or categorize a document in response to a request issued by a user.
To the above end, document data or a document file supplied to a user should include information required to process the document data. Thus, there is a need for an authoring technique for generating document data including such information.
The authoring technique should be easily used not only by users having high-level knowledge but also general users who do not have high-level knowledge.