The present invention generally relates to document retrieval systems, and more particularly to a document retrieval system which can generate a group of keywords which are close to the vocabulary or image of the user at a high speed and with a high flexibility.
Conventional document retrieval systems can generally be categorized into two kinds depending on the different registration and retrieval methods. The two kinds are the thesaurus type and the free keyword type.
According to the thesaurus type document retrieval system, the operator at the time of the document registration selects those keywords which are assumed to be appropriate and registers the selected keywords together with the bibliographical matters. At the time of the document retrieval the user designates the keywords which are assumed to be appropriate out of the thesaurus (group of keywords) for use in retrieving the document.
On the other hand, according to the free keyword type document retrieval system, the operator at the time of the document registration registers only the bibliographical matters and the document content. At the time of the document retrieval, the user designates the free keywords to retrieve the document.
The thesaurus type document retrieval system may have an inverted file and it is possible to make a high speed document retrieval. However, there is a problem in that a large memory capacity is required to store the keywords. In addition, the keywords selected by the operator at the time of the document registration may not necessarily be appropriate, and there is a problem in that the appropriateness of the selected keywords determines the performance of the system. Furthermore, there are problems in that the indexing (classification) and renewal of the keywords using the thesaurus are both complex and not necessarily appropriate.
On the other hand, the free keyword type document retrieval system need only have a small memory capacity to store the keywords. Moreover, the existence or non-existence of a document including the designated keyword is clear, and the classification (indexing) of the keywords is unnecessary. But there are problems in that the retrieval time is long because the entire document is referred to by use of the designated keyword, and the system is not suited for making a fuzzy retrieval such as processing synonyms.