Up to now, a WWW (World Wide Web) is presented in the Internet as an application service furnishing the hypertext type information in the window form.
The WWW is a system executing document processing for document formulation, publication or co-owning for showing what should be the document of a new style. However, from the standpoint of actual document utilization, an advanced documentation surpassing the WWW, such as document classification or summary derived from document contents, is retained to be desirable. For this advanced document processing, mechanical processing of the document contents is indispensable.
However, mechanical processing of the document contents is still difficult for the following reason. First, the HTML (Hyper Text Markup Language), as a language stating the hypertext, prescribing the expression in the document, scarcely prescribes the document contents. Second, the network of the hypertext network, formed between the documents, is not necessarily utilizable readily for a reader of the document desirous to understand the document contents. Third, an author of a document writes without taking the convenience in reading for a reader into account, however, it never occurs that the convenience for the reader of the document is compromised with the convenience for the author.
That is, the WWW, which is a system showing what should be the new document, is unable to perform advanced document processing because it cannot process the document mechanically. Stated differently, mechanical document processing is necessary in order to execute highly advanced document processing.
In this consideration, a system for supporting the mechanical document processing has been developed on th basis of the results of investigations into natural languages. There has been proposed the mechanical document processing exploiting the attribute information or tags as to the inner structure of the document affixed by the authors of the document.
Meanwhile, the user exploits an information retrieval system, such as a so-called search engine, to search the desired information from the voluminous information purveyed over the Internet. This information retrieval system is a system for retrieving the information based on the specified keyword to furnish the retrieved information to the user, who then selects the desired information from the so-furnished information.
In the information retrieval system, the information can be retrieved in this manner extremely readily. However, the user has to take a glance of the information furnished on retrieval to understand the schematics to check whether or not the information is what the or she desires. This operation means a significant load on the user if the furnished information is voluminous. So, notice is recently directed to a so-called automatic summary formulating system which automatically summarizes the contents of the text information, that is document contents.
The automatic summary formulating system is such a system which formulates a summary by decreasing the length or complexity of the text information while retaining the purport of the original information, that is the document. The user may take a glance through the summary prepared by this automatic summary formulating system to understand the schematics of the document.
Usually, the automatic summary formulating system adds the degree of importance derived from some information to the sentences or words in the text as units by way of sequencing. The automatic summary formulating system agglomerates the sentences or words of an upper order in the sequence to formulate a summary.
Recently, with the coming into extensive use of computers and in networking, there is raised a demand towards higher functions of document processing, in particular towards the function of speech-synthesizing and reading the document out.
Inherently, speech synthesis generates the speech mechanically based on the results of speech analysis and on the simulation of the speech generating mechanism of the human being, and assembles elements or phonemes of the individual language under digital control.
However, with speech synthesis, a given document cannot be read out taking the interruptions in the document into account, such that natural reading cannot be achieved. Moreover, in speech synthesis, the user has to select a speech synthesis engine depending on the particular language used. Also, in speech synthesis, the precision in correct reading of words liable to misreading, such as specialized terms or Chinese words difficult to pronounce in Japanese, depends on the particular dictionary used. In addition, if a summary text is prepared, it can be visually grasped that the portion of the text is critical, however, it is difficult to attract the user's attention if speech synthesis is used.