1. Field of the Invention
The present invention relates to a scheme for constructing a database for a user system which handles structured documents such as WWW (World Wide Web) documents and e-mails, by automatically storing information extracted from the structured documents.
2. Description of the Background Art
A structured document is a document capable of expressing a document structure by using partitions called tags which are embedded in the document. There are also some structured documents which provide a way of defining a document structure called DTD (Document Type Definition). Such structured documents include SGML documents, XML documents, HTML documents, and documents transmitted as e-mails or e-news. In such a structured document, its document structure can be easily analyzed.
In the WWW, e-mails, e-news, etc., the structured documents are exchanged through a communication path such as the Internet or Intranet.
First, the WWW system will be described. In the WWW, structured documents are exchanged by connecting a WWW server which is a structured document delivery server and a browser which is a structured document display device, through a communication path. In the case of accessing a structured document, the browser requests a transmission of a desired structure document with respect to the WWW server, and the WWW server in response transmits the structured document. The structured document exchanged in the WWW is also sometimes referred to as a WWW page.
In the usual WWW, in order to suppress the increase of an amount of data to be transmitted through the communication path and the concentration of processing loads on the WWW server, a data relay device called proxy server is often provided between the WWW server and the browser. This proxy server is located in a middle of the browser and the WWW server, and has a function for relaying a request from the browser to the WWW server.
The proxy server receives a structured document transmission request from the browser, and transmits the structured document transmission request to the WWW server on behalf of the browser. Then, the structured document transmitted from the WWW server is transferred to the browser. It is possible to provide just one proxy server or a plurality of proxy servers between the WWW server and the browser.
Among such proxy servers, there can be a proxy server which has a function for temporarily storing a plurality of structured documents to be transferred in the proxy server. Such a data transfer device having a function for temporarily storing a plurality of structured documents is sometimes also referred to as a cache server.
In the proxy server which temporarily stores a plurality of structured documents, a structured document whose transmission was requested from the browser will be temporarily stored. Then, when there is a transmission request from the browser with respect to the same structured document, the structured document that has been temporarily stored will be transmitted to the browser instead of transferring the transmission request to the WWW server again.
There are also some proxy servers which have a function for checking with the WWW server as to whether the temporarily stored structured document is the latest one or not at a time of transferring the temporarily stored structured document.
Usually, in the proxy server, the structured documents are stored up to some prescribed amount. At a time of temporarily storing the structured document, the structured document is not discriminated according to a browser which had requested that structured document. In the case of storing the structured document in excess of the prescribed amount, the other structured documents which have been temporarily stored will be deleted according to some rule. As a rule to be used here, a simple rule such as that for sequentially deleting a structured document with the oldest last access time among the stored structured documents is often employed. As a memory device for temporarily storing the structured document, a device such as a magnetic disk device is used, for example.
There are also some display devices such as browsers which have a function for temporarily storing a certain amount of structured documents. In such a browser, when a structured document requested by a user is one that is temporarily stored in the browser, the structured document that has been temporarily stored will be displayed instead of requesting the structured document transmission to the www server again.
There are also some browsers which have a function for checking with the WWW server or the proxy server as to whether the temporarily stored structured document is the latest one or not at a time of displaying the temporarily stored structured document.
Usually, in such a browser, the structured documents are stored up to some prescribed amount. In the case of storing the structured document in excess of the prescribed amount, the other structured documents which have been temporarily stored will be deleted according to some rule. As a rule to be used here, a simple rule such as that for sequentially deleting a structured document with the oldest last access time among the stored structured documents is often employed. As a memory device for temporarily storing the structured document, a device such as a magnetic disk device is used, for example.
Conventionally, the WWW has been operated in one of the following three forms.
(1) A form in which the WWW server which is a structured document delivery server and the browser which is a structured document display device are connected through a communication path and structured documents are exchanged between them.
(2) A form in which one or a plurality of proxy servers for relaying structured document requests and responses are provided between the WWW server which is a structured document delivery server and the browser which is a structured document display device, and the WWW server and the browser are connected and structured documents are exchanged between them through these proxy servers.
(3) A form in which one or a plurality of proxy servers for temporarily storing structured documents are provided between the WWW server which is a structured document delivery server and the browser which is a structured document display device, and the WWW server and the browser are connected and structured documents are exchanged between them through these proxy servers.
However, in the case where there is no proxy server between the WWW server and the browser as in the above (1), the structured document will be directly transmitted to the browser so that there has been a problem that the user is required to judge whether it is a document containing important items at a time of receiving it, and store that document according to the need.
Also, in the case where there is a proxy server between the WWW server and the browser as in the above (2), if the proxy server only has a function for relaying requests and responses, the structured document will be directly transmitted to the browser so that there has been a problem that the user is required to judge whether it is a document containing important items at a time of receiving it, and store that document according to the need.
On the other hand, in the case where there is a proxy server having a function for temporarily storing structured documents between the WWW server and the browser as in the above (3) or in the case where the browser itself has a function for temporarily storing received structured documents, the structured documents that are temporarily stored in the proxy server or the browser will be deleted according to some rule. As a rule to be used here, a simple rule such as that for sequentially deleting a structured document with the oldest last access time among the stored structured documents is usually employed. For this reason, even when there is a structured document that contains important contents, that structured document will be deleted without judging whether that document is an important one or not, so that there has been a problem that it can become impossible to access the structured document that contains important contents.
In order to prevent the structured document that contains important contents from being deleted soon, it is possible to store many structured documents, but in this case, there has been a problem that a memory device having a large capacity is required. Also, in this case, because many structured documents are stored, there has been a problem that it requires a considerable amount of time at a time of accessing or searching the structured document that contains important contents.
Next, the e-mail system will be described. The e-mail system comprises an e-mail server and an e-mail transmission and reception device, which are connected through a communication path such as the Internet or Intranet.
A sender of an e-mail creates the e-mail by using the e-mail transmission and reception device, and transmits the e-mail by specifying an address of a receiver. The e-mail server which received the e-mail then transfers the e-mail according to the address of the receiver specified by the sender, either to a destination e-mail server or to an e-mail server for relaying this e-mail when a direct transmission is not possible.
The destination e-mail server stores the transferred e-mails by classifying them according to destination users. The receiver of the e-mail receives the e-mail received by the e-mail server by using the e-mail transmission and reception device, and reads the e-mail. Finally, the e-mail is stored in an e-mail storage server or the e-mail transmission and reception device, or both. There are cases where the structured documents are used as documents of such e-mails.
Usually, the e-mails can be stored up to a capacity of an e-mail storage device associated with the e-mail server, and the e-mail server cannot receive any e-mails beyond that capacity. If that happens, usually a user explicitly deletes e-mails destined to that user.
In this case of the e-mail system, similarly as in the case of the WWW, structured documents contained in e-mails can be stored up to a capacity of a memory device connected to the e-mail server or the e-mail transmission and reception device, but e-mails cannot be received beyond that capacity. For this reason, it is necessary for the user to explicitly delete e-mails, but at that point, there has been a problem that the user have to carry out the deletion while checking the importance of each structured document to be deleted.
Again, in order to prevent the structured document that contains important contents from being deleted soon, it is possible to store many structured documents, but in this case, there has been a problem that a memory device having a large capacity is required. Also, in this case, because many structured documents are stored, there has been a problem that it requires a considerable amount of time at a time of accessing or searching the structured document that contains important contents.
Next, the e-news system will be described. The e-news system comprises an e-news server and an e-news transmission and reception device, which are connected through a communication path such as the Internet or Intranet.
A contributor of an e-news article creates an e-news article by using the e-news transmission and reception device, and transmits it to the e-news server. The e-news server that received the e-news article then transmits the e-news article to the other e-news servers that are connected through communication paths sequentially. Here, the e-news server has a list of e-news articles that had been received, so that it will not receive the already received e-news article once again.
A subscriber of the e-news article receives the e-news article from the e-news server by using the e-news transmission and reception device.
Usually, the e-news articles can be stored up to a capacity of an e-news article storage device associated with the e-news server, and the e-news server cannot receive any e-news articles beyond that capacity. If that happens, the e-news articles are deleted either automatically by the e-news system that is operating on the e-news server, or by a manager of the e-news server explicitly.
In this case of the e-news system, similarly as in the case of the WWW, at a time of deleting the e-news articles, even when there is a structured document contained in the e-news article that contains important contents, that structured document will be deleted without judging whether that document is an important one or not, so that there has been a problem that it can become impossible to access the structured document that contains important contents.
Again, in order to prevent the structured document that contains important contents from being deleted soon, it is possible to store many structured documents, but in this case, there has been a problem that a memory device having a large capacity is required. Also, in this case, because many structured documents are stored, there has been a problem that it requires a considerable amount of time at a time of accessing or searching the structured document that contains important contents.