1. Field of the Invention
The present invention relates to a registration apparatus for a compound-word dictionary. A compound-word dictionary in the present invention is employed in a machine translation apparatus, a Japanese kana-kanji converter i.e., a Japanese syllable-Chinese character converter, or the like. In general, necessary compound words can be added to this kind of compound-word dictionary. The present invention relates to an apparatus for automatically judging whether or not to register a new compound word in a compound-word dictionary and registering a compound word, which has been judged to be registered, as an entry word in the compound-word dictionary.
2. Description of the Related Art
In a compound-word dictionary, when the number of entry words increases, the precision in natural language analysis improves. However, the compound-word dictionary gets larger in size. A large number of compound words that exceeds the capacity of the compound-word dictionary cannot be registered. In the past, the following criteria have been proposed for automatically judging whether or not to register a new compound word in a compound-word dictionary:
(a) When a compound word is an undefined word and is comprised of defined individual words, the compound word is not registered as an entry word in the compound-word dictionary; PA1 (b) When a compound word is an undefined word, the undefined compound word is registered as an entry word in the compound-word dictionary; or PA1 (c) The literature is inspected, and compound words appearing in the literature and having high use frequencies are registered in the compound-word dictionary.
However, in (a) above, even though a compound-word to be registered as one word will not be registered as an entry word, when the compound word is processed in a machine translator or a kana-kanji converter using an unsuitable compound dictionary, a suitable expression will not be chosen in machine translation and a string kana (Japanese syllable) will be converted to an unsuitable kanji (Chinese character) during kana-kanji conversion.
In (b) above, since all of the undefined compound words will be registered as entry words in the compound-word dictionary, the size of the compound-word dictionary becomes too large.
Under the criterion (c), the literature concerning diverse fields must be inspected. If the range of fields to be inspected is narrow, compound words employed in specific fields alone are registered in the compound-word dictionary.