1) Field of the Invention
The present invention relates to an apparatus or a system and method which deals with a large amount of document data and particularly relates to a document processing and management apparatus which manages stored document groups. More particularly, this invention relates to a document processing and management apparatus which extracts documents related to a document in view, from a large amount of stored document groups.
2) Description of the Related Art
As communications technology has been progressing in recent years, anyone can acquire various pieces of information. On the other hand, an amount of information to be dealt with has been greatly increasing. Therefore, it is essential to efficiently find out any desired information or document from a huge amount of information.
Generally, in a computer filing system, documents are managed using hierarchical directories, and the documents are thoroughly classified and ordered on the hierarchical directories and thereby facilitating searching a related document. In addition, there are known various search tools which are provided to search a document including a specific word or phrase by searching a keyword. This keyword search enables finding of a related document which includes a specific word or phrase.
In the World Wide Web (WWW) on the Internet, related documents can be connected to one another by link to facilitate reference to the related documents. Further, there is known a technique for WWW browsers or the like, for displaying a document viewing history in a certain period or by a certain number of documents and sequencing the documents according to viewing frequency.
In relation to such information and document search techniques, Japanese Patent Application Laid-Open No. 2000-20202 describes an information reference supporting device which analyzes a user viewing history of information based on criteria for judgment on the degree of importance that are designated by the user, and which displays information of the high degree of importance by symbols which are associated with the information by the user according to this analysis result.
However, if a document is to be viewed, a user often wants only to refer to documents related to the document. If documents are managed by hierarchical directories, it is easy to find out related documents. This, however, requires an operation for classifying the documents in advance when the documents are stored. Further, with an increase in the number of documents, it is often necessary to change a classification system, and it takes a lot of labor and time to make this change.
It is possible to find related documents each of which includes a specific word or phrase using the keyword search technique, but a keyword entry is essential. Further, it is not always easy to select an appropriate keyword to search the related documents.
If a WWW browser or the like is used, it is required to carry out operations for studying the relation of documents and establishing links among related documents in advance so as to facilitate reference to the related documents.
According to a method for displaying a document viewing history within a certain period or by a certain number of documents, a method of sequencing documents according to viewing frequency, or to the information reference supporting device described in Japanese Patent Application Laid-Open No. 2000-20202, it is not always possible to efficiently find out documents related to a certain document. It is therefore necessary to carry out an operation for defining the relationship between the symbols and the judgment criteria for the degree of importance in advance.
In general, it is found from observation of a document utilization status that every document is not always viewed uniformly and documents to be viewed vary depending on individual users. There is a statistical property that a document group which is much interested and viewed highly frequently is repeatedly referred to, that is, as co-occurrence information. The co-occurrence information mentioned here indicates a document which is referred to during a series of coherent operations. FIG. 9 is a schematic view which explains information for a document viewing history. In FIG. 9, viewed documents are arranged as “A”, “B”, “C”, . . . in time series and encircled in units of a series of operations as co-occurrence information. If the time series is observed while taking FIG. 9 as an example, it is seen that there is a marked trend that if “B” is viewed, “C” is simultaneously viewed, followed by “A” and “D”, but that “E” is not viewed simultaneously with “B”.