The present invention relates to a system for classifying documents distributed and existent in a network environment, and more particularly to a distributed document classifying system such that classification items are prepared in advance and a service provider judges contents of typical documents to be classified while a similarity of any other document with respect to the classified document group is calculated to effect automatic classification.
In a network environment such as World Wide Web (which will be referred to as WWW hereunder) in which a plurality of documents are distributed and existent, searching a desired document becomes harder in proportion to a number of documents. As a countermeasure, a directory service such as that of WWW for previously classifying documents distributed on the network and storing their sites or bibliographic items in a database in order to provide a retrieving service to clients has widely spread. The present invention relates to a distributed document classifying system required for realizing such a directory service.
FIG. 14 shows a structure of this type of conventional distributed document classifying system. As shown in the drawing, the prior art distributed document classifying system comprises: a database section 92 including a classification information storage section 921 and a document information storage section 922; and a document manual registration section 91.
In the database section 92, the document information storage section 922 stores document identifiers of documents distributed and existent on the network and a list of the bibliographic items, and the classification information storage section 921 stores a list of classification items and document identifiers of documents classified into respective classification items. When registering a new document, a service manager carries out: judgment on an item to be used as a bibliographic item by confirming the content of that document; addition of the judged bibliographic item and a document identifier determined according to a given method to the document information storage section 922 through the document manual registration section 91; and additional registration of the document identifier of that document to the corresponding classification item in the classification information storage section 921 by judging the item to which that document is classified by the content of the confirmed document.
Although the document classification operation is all manually carried out in the above distributed document classification system, a system for automatically performing such an operation has been proposed. For example, Japanese patent laid-open publication No. Hei 7-49875 discloses a system for automatically classifying documents by calculating a conformity between each document and a word list as a retrieval condition which is previously prepared in accordance with each classification. Further, according to this system, the updated state of the documents on the network is monitored and the updated document is collected for the classification process.
In the conventional distributed document classifying system shown in FIG. 14, however, registration of the document identifier or the bibliographic item and the operation for classifying each document must be all effected by the service provider by using the document manual registration section, leading to an increase in the cost.
On the other hand, the system disclosed in Japanese patent laid-open publication No. Hei 7-49875 is capable of automatically classifying the documents. However, it presumes that all the documents must be automatically classified in advance, and hence it is required to preset a retrieval condition given to each classification. Although a word list is an example of the retrieval condition, an adequate skill is needed for appropriately setting the retrieval condition for each classification if none of the document is yet to be classified. In addition, if several documents to be classified into given classification items are actually checked up to determine the retrieval condition, these documents used for this work must be also dealt as a target of the automatic classification even though they have been already classified, and this is thus a wasteful processing step.
It is therefore a first object of the present invention is to provide a distributed document classifying system such that the manual classification and the automatic classification are both used and a service provider manually classifies some of documents distributed and existent in a network environment while any other document is automatically classified by calculating a conformity of these documents with the classified document group.
Further, in the document manual classification by the service provider or the document automatic classification by calculating a conformity of any other document with the manually-classified document group, a result of document classification often depends on judgment by the service provider who carries out the classification and does not always accord with an intention of a document creator. According to the art disclosed in Japanese patent laid-open publication No. Hei 7-49875, all the documents are automatically classified based on the retrieval condition, and hence they are disadvantageously classified irrespective of the intention of each document creator. Since the document creator has thorough knowledge of his/her document, and the cooperation of such a document creator will enable the further appropriate classification.
It is therefore a second object of the present invention to provide a distributed document classifying system by which a document creator can explicitly specify a classification to which his/her document should belong.