1. Field of the Invention
The present invention relates to the field of natural language analysis using a computer and, more particularly, to a technology for decomposing a sentence into words in a morphological analysis.
2. Description of the Related Art
In natural language analysis using a computer, a sentence is firstly decomposed into words. In a language employing a notation method without separation of words, such as Japanese, the morphological analysis is performed to extract the words composing the sentence.
In such a process of decomposing the sentence into words, it is important to appropriately decompose a complex word consisting of two or more words to form one word, for which there are conventionally various techniques (e.g., refer to Published Unexamined Patent Application No. 2002-251402).
FIG. 11 is a block diagram showing a functional block of the conventional morphological analysis means implemented on the computer, and FIG. 12 is a flowchart for schematically explaining a method of the conventional morphological analysis.
As shown in FIGS. 11 and 12, in the morphological analysis, first of all, a token list generating unit 111 cuts out the character strings of various sizes from a sentence to be processed and obtains all possible tokens (step 1201). A token list on which each token and its attribute (part of speech) are registered is generated by retrieving a master dictionary 112 (step 1202). Herein, the token is the minimum element composing the sentence or word. For example, the word “morphology” has tokens of “mor”, “morpho”, “morphology”, “pho” and “logy”.
Then, a token string selecting unit 113 references a grammar dictionary 114, and selects an optimum token string from among the combinations of all possible tokens detected at step 1201 (step 1203).
Thereafter, a complex word decomposition processing unit 115 matches the token string selected at step 1203 with a complex word dictionary 116, and decomposes decomposable tokens into smaller tokens (step 1204).
Problems To Be Solved By The Invention
As described above, since the conventional morphological analysis involved selecting a token string and then decomposing a complex word, it took more time to make the matching process for the complex word, and this time was longer as more complex words were contained in the sentence.
Also, since the conventional morphological analysis involved selecting a suitable token string and then decomposing the complex word, there was a drawback that the token string from the decomposed words (tokens) was not assured to be optimum.
Moreover, since the complex word dictionary referenced in decomposing the complex word comprises the part of speech information and the delimiter position information for the complex word and the words composing the complex word, it took a lot of time to make the generation or maintenance operation.
Thus, it is an object of this invention to provide efficient decomposition processing of a complex word in processing of decomposing a sentence into words in the morphological analysis to enhance the execution efficiency of the overall processing.
Another object of the present invention is to provide efficient decomposition processing of a complex word in processing of decomposing a sentence into words in a morphological analysis and to enable an assurance of optimum token strings obtained as an analysis result when the complex word is decomposed.
Also, it is another object of this invention to enable an assurance of optimum token strings obtained as an analysis result when the complex word is decomposed.
Moreover, it is a further object of the invention to reduce the time needed to generate and maintain the complex word dictionary.