The present invention relates to a method for collecting document information through a communication network, and more particularly to document information collection method and document information collection apparatus which allow to automatically form a database from document information acquired from a plurality of information sources.
Many document management servers for electronically referring documents such as an on-line news service, an electronic bulletin board system and a document database have been known. Some of those systems may allow retrieval of a stored document but the retrieval is conducted for the entire data in the system. Accordingly, when a user wants to refer again a document which the user has referred in the past, it is necessary to retrieve it from the result of retrieval for the entire system or the user is required to explicitly register the document which he has referred in a database which he managed by himself. When the user utilizes a plurality of document management servers, the user must sequentially access the respective systems to conduct the retrieval because there is no means for retrieving in one pass from a plurality of systems having different document acquirement protocols or communication protocols.
As a method for determining a particular server in which a document of interest resides from a number of document management servers, it is known to provide a server directory which is a database for determining documents which each of the document management servers contains, as described in "Outlook of Next-generation Information Distribution System", Richard Maron Stein, NIKKEI BYTE, November 1991, pp. 320-331. However, the server directory disclosed therein can manage only information on the document management servers which communicate under a particular protocol. Further, the information on each document management server managed by the server directory is a document on which a manager of each document management server has described about features of the server and it is not always described from a view point of a retrieving user and it does not always properly express the document managed by each server.
Some of the document management servers have a function to previously register a retrieval condition and notify to the user when a new document which matches the condition is registered in the document management server. However, what is informed is only the newly arrived information on the document management server which has registered the retrieval condition, and even if newly arrived information on other document management server matches the retrieval condition, it is not informed. Further, in order to detect the newly arrived information on the document it is necessary to periodically access the document management server to execute the same retrieval or individually describe a program to periodically acquire the document.
Recently, the number of documents which are electronically accessible as well as the number of document management servers which provide documents are huge.
Even if a user wants to read again a document which the user has read before, it is difficult to remember a location of the document. Even if the user wants to save the document in his hand, he is unaware of which document will be required later and it is trouble-some to save all documents that the user reads. Some document servers do not have a retrieval function and in this case it is difficult to locate a desired document. Even with a server having a retrieval function, a retrieval result is huge depending on the retrieval condition and it is difficult to find a desired one from the retrieval result. Further, since the retrieval function differs from server to server, it is required to remember a server in which a particular document resides or access many document management servers.
It is therefore a first object of the present invention to provide document information collection method and document information collection apparatus which automatically and collectively store documents referred from various document management servers in a document database which managed by a user in order to allow the retrieval.
When a document which has not been referred before but newly matches a condition is to be retrieved and it is not known which document management server the document is to be retrieved from, it is necessary to access many document management servers. As described above, a problem is involved in the method of providing a server directory which is a database for the document management servers.
It is a second object of the present invention to provide document information collection method and document information collection apparatus which allows the detection of a particular document management server in which a document along a user's intent resides.
The document management servers and the documents in each sever are increasing day by day and even if a document management server or new document which may be a new information source of interest is involved, it is difficult to find it. It is particularly difficult to find the existence of a document management server which is a new information source. However, it is highly probable that a document or a document management server referred by other user who has a similar interest to that of the user includes document information of the user's interest and those information are effective in detecting a new document or information source.
It is a third object of the present invention to provide document information collection method and document information collection apparatus which allow the automatic detection of information of interest to a user from documents or document management servers newly found by other users.