FIG. 11 is a diagram showing the structure of a conventional keyword extracting device described in Japanese Unexamined Patent Publication No. 334102/1998, for example. In FIG. 11, 1 denotes a database, 2 denotes a primary keyword extractor, 3 denotes a character information section, 4 denotes a primary keyword storage section, 5 denotes an unnecessary word removing section, and 6 denotes a keyword storage section.
Next, an operation will be described. Based on the information of the character information section 3 determining the type of characters to be keywords, the primary keyword extractor 2 extracts a character string to be a primary keyword from the database 1 and stores the character string in the primary keyword storage section 4. The unnecessary word removing section 5 removes, as an unnecessary word, a primary keyword which can be described as a coupling of the other primary keyword (that is, a synthetic word) and stores residual keywords in the keyword storage section.
Moreover, there has also been described the unnecessary word removing section 5 removes a primary keyword for one character, removes a prestored prefix and suffix to carry out an unnecessary word removing processing and does not remove a synthetic word that is previously registered or frequently appears.
The conventional keyword extracting device is based on character information. Therefore, it has been difficult to extract a keyword concerning a plurality of character types. Referring to a portion which can be so identified as not to be a keyword, moreover, a keyword extracting processing is carried out. As a result of an unnecessary processing, therefore, there has also been a possibility that a keyword might be extracted erroneously. Furthermore, information about a synthetic word to be a keyword, an unnecessary primary keyword, a prefix and a suffix are stored/defined as a character string or a simple character number. Accordingly, there has also been a problem in that flexibility and simplicity of description cannot be obtained, a portion which is not prefix or suffix is deleted by mistake or a character string for one character to be a keyword cannot be extracted. Moreover, it is also impossible to modularize the information corresponding to a field and document type and the like and to combine them for use if necessary. The reusability of the information is poor.
The present invention has been made to solve the above-mentioned problems and has an object to obtain a keyword extracting device for efficiently extracting a keyword with high precision while enhancing descriptive properties and reusability of information about keyword extraction.