1. Field of the Invention
The present invention relates to a computerized document processing system for abstracting a computerized document obtained by sending a keyword or the like via a network.
2. Description of Related Art
With the spread of Internet, users are now allowed to readily access information on Internet by using WWW. Then, many individuals and corporations have come to publicly disclose a hyper-text file called a Web page.
However, it has become difficult for each individual user to know where a Web page required by the user is located and to know what should be specified as an URL address to obtain the required Web page.
Then, a system for retrieving accessible Web pages per their contents has been developed for that end and its service has come to be given. That is, such Web retrieve server has allowed a Web page containing a keyword to be retrieved by specifying the keyword. Users have used to retrieve necessary Web pages by using such Web retrieve server.
Hitherto, it has general practice to list a certain amount of document titles, headers or keywords of documents and pages on the upper rank in a retrieve result. Some retrieve server has also registered manually prepared outlines or introductions of pages to present as a retrieved result. Looking at the result, the user has used to determine whether or not to directly make reference to the retrieved page.
Hereinafter, one represented about each document when the retrieved result is thus presented will be referred to as an xe2x80x9cabstractxe2x80x9d of the document and a page in which xe2x80x9cabstractsxe2x80x9d of each document are collected will be referred to as an abstract page or an abstract document.
Then, as a method for presenting the retrieved result, it is conceivable to represent a portion where a retrieve word occurs within a retrieved document by KWIC (Key Word In Context). The KWIC representation is a suitable representation method in discriminating a retrieved page in general. However, the KWIC representation is not actually realized for an abstract of a retrieved result of the retrieve server. The reason thereof will be described below.
The presentation of the above-mentioned retrieved result is carried out by one retrieve server. The retrieve server cannot spend much processing time for presenting the retrieved result because it has to respond to retrieve requests of many and unspecified users. Accordingly, the retrieve server presents one which can be generated by a very simple process as the retrieved result. Or, the retrieve server normally uses a method of creating a text to be presented as a retrieved result in advance about one document and of presenting it when the document is retrieved.
Because the KWIC representation is a process requiring more processing amount and because retrieval character strings differ every time when a retrieval is made, it cannot be created in advance. Accordingly, it has not been realized so much.
It is also conceivable to stretch a link from a portion where the KWIC representation is made so that the appropriate portion of the original page may be directly referred in representing by KWIC. However, the original page must be modified in order to stretch the link. It is then conceivable to deal with that by modifying a copy of the document in representing by KWIC while holding the document to be retrieved on a local disk of the retrieve server as it is. However, it is difficult to hold all the Internet documents to be retrieved in terms of capacity. It is also difficult in terms of copy right to modify the copy.
It is also conceivable to obtain a document in the high order of the retrieved result from the site where the document exists and to modify the document to utilize for the KWIC representation. However, because it takes several minutes or more, it cannot be realized for the retrieve server responding to retrieve requests of many users.
Accordingly, it is an object of the present invention to provide a computerized document processing system which allows processes in creating an abstract to be distributed and an original document to be readily modified to relate the abstract with the original document (e.g. link, highlighting of extracted character string and the like) not by generating the abstract document, i.e. the retrieved result, in the retrieve server but by incorporating a module for presenting the retrieved result on the client side or by holding copies of all Web pages in the retrieve server within a local net called Intranet in which there is no problem in terms of copy right and by realizing the KWIC representation by modifying it appropriately.
According to the present invention, there is provided a computerized document processing system which allows processes in creating an abstract to be distributed and an original document to be readily modified to relate the abstract with the original document.
The computerized document processing system comprises a keyword holding section for holding keywords, document storage section for holding a computerized document transferred via a network, abstract creating section for creating an abstract by extracting at least a character string containing a keyword held in the keyword holding section from the computerized document held in the document storage section, document modifying section for modifying the computerized document such that it can be represented by relating with the abstract created by the abstract creating section, and modified document storage section for storing the computerized document modified by the document modifying section. The computerized document processing system presents the abstract created by the abstract creating section and presents the modified computerized document linked to a predetermined portion of the presented abstract by reading it from the modified document storage section in correspondence to a specification made by a user.