1. Field of the Invention
The present invention relates to a medium a recorded document retrieval program for retrieving classified documents, a document retrieval apparatus and a document retrieval method.
2. Description of the Related Art
Documents including patent documents and treatises contain document data that are hierarchically classified. When retrieving such documents by means of keywords, problematic cases of two different types as listed below can occur.    (Case 1) A same word has a plurality of different meanings.    (Case 2) A same concept is expressed by different words.
Case 1 can give rise to retrieval noises (unwanted documents are included in the retrieved documents), whereas case 2 can produce missing documents (a wanted document is or wanted documents are not included in the retrieved documents).
A technique for avoiding these problems (of retrieval noises and missing documents) is the use of classification codes. As for patent documents, each patent document may be provided with one or more than one hierarchical classification codes such as an IPC (International Patent Classification) code, a Japanese FI (File Index) or F (File Forming) term and/or a U.S. Patent Classification code. Particularly, in the case of FIs and F terms, classification codes are assigned to more than one hundred thousands items. Therefore, once the user can find out the classification code that matches the objective of retrieval, wanted documents can be retrieved highly accurately and reliably with minimal retrieval noises and minimal missing documents by using the classification code as key for the retrieval of documents. Then, however, another problem arises because the classification codes are too hierarchical and too minute and the user feels it difficult to find out the classification code that exactly matches the objective of retrieval.
The first conceivable technique for retrieving a classification code is the use of the sentence defining or explaining the classification code for the retrieval. This technique is actually being used in the patent map guidance in the Japanese IPDL (Industrial Property Digital Library). However, the document containing the sentence of the definition or the explanation has to be retrieved by means of a keyword with this first technique, this technique is not free from the problem of retrieval noises and missing documents.
Patent Document 1 (Jpn. Pat. Appln. Laid-Open Publication No. 11-328192) discloses a method of utilizing a concept dictionary that is provided separately for the purpose of lessening the problem of retrieval noises and missing documents. However, it is not easy to maintain the concept dictionary, accommodating the revisions made to the classification and technological developments.
The second conceivable technique for retrieving a classification code is the utilization of the co-occurrence relations of classification codes and keywords in documents (information on that classification codes and keywords appear concurrently in a same document). With this technique, information on documents is retrieved from keywords (or classification codes) and the classification codes that are strongly related to a specified keyword are rated and displayed after summing up the classification codes assigned to the obtained group of documents.
Patent Document 2 (Jpn. Pat. Appln. Laid-Open Publication No. 2002-351896) discloses a technique of retrieving a group of documents by utilizing keywords and patent classifications and extracting and displaying classification codes from the retrieved group of documents. Patent Document 3 (Jpn. Pat. Appln. Laid-Open Publication No. 2003-044493) discloses a technique of facilitating the effort of detecting adjacent (related) classification codes by sorting the retrieved and summed up classification codes into two hierarchies and displaying combinations of classification codes and definitions of classifications. With either of these techniques, it is possible to utilize variations of expressions of keywords in actual documents and hence reduce missing documents.
While the technique of utilizing co-ccurrence relations of classification codes and keywords in documents can reduce missing documents as described above, it is accompanied by the following problems.    (Problem 1) The problem of retrieval noises (classification codes that do not match the objective and the intension of the retrieval are displayed) remains when a large number of classification codes are frequently assigned to a patent document as in the case of the F terms.    (Problem 2) Since the classification codes assigned to document data are normally hierarchical, when same classification codes of a single type are totaled as classification codes of a plurality of hierarchies in a process of summing up the co-occurrence relations of keywords and classification codes, there arises a problem that they are strongly related to classification codes of upper hierarchies (it is not possible to find a classification code of an appropriate grain size).