In natural language processing represented by machine translation, text mining and the like, syntactic analysis processing for analyzing an input sentence is important.
In the syntactic analysis processing, a sequence of processing is performed on an input sentence, such as (1) dividing the sentence into words, (2) giving a word class to each word, (3) determining interrelationship among the words, and (4) giving semantic information to the words.
However, since a grammatical element of a natural language such as a word, a phrase and the like can have a plurality of grammatical functions such as a plurality of meanings, a plurality of word classes and the like, the grammatical element per se can have an ambiguity with a plurality of meanings provided thereby, rather than be identified as having a single meaning.
For this reason, in the syntactic analysis processing, an analysis is made in consideration of the ambiguity of grammatical elements.
Specifically, a language processing device performs an analysis as follows when the language processing device analyzes a sentence which includes a grammatical element, which has grammatical functions such as a plurality of meanings or a plurality of word classes and the like, such as a word or a phrase and the like (hereinafter called the “polysemic word”).
First, the language processing device creates a plurality of candidates in accordance with a plurality of grammatical functions (hereinafter called “a plurality of meanings”) possessed by a polysemic word. Subsequently, the language processing device analyzes a plurality of these candidates to output a single analysis result.
Accordingly, the language processing device takes an immense amount of time for the syntactic analysis processing when the language processing device analyzes a sentence which includes a polysemic word.
Many methods have been conventionally proposed for processing a syntactic analysis at higher speeds. For example, there is a method of speeding processing by deleting at earlier stages unnecessary candidates which can be deleted without changing the syntactic analysis result.
As a method of creating rules for identifying such unnecessary candidates, there has been conventionally proposed a method of manually enumerating the rules in advance, but manual data creation is not realistic because this is costly.
On the other hand, Patent Document 1 (JP-2-114377-A) describes a natural language processing device which learns ambiguity elimination models (rules) in accordance with instances in analysis results of syntactic analysis processing.
Specifically, Patent Document 1 describes a natural language processing device which learns a model for eliminating an ambiguity of a word class from an analysis result of syntactic analysis processing.
This conventional natural language processing device comprises a morphological analysis unit, a syntactic analysis unit, a learning device, and a learning result holding unit. The conventional natural language processing device having such a configuration operates in the following manner.
The morphological analysis unit morphemically analyzes an input sentence. The syntactic analysis unit syntactically analyzes based on the result of the morphological analysis. The learning device receives a word class sequence having an ambiguity, which is outputted by the morphologic analysis unit, and a word class sequence, which is determined on the basis of the result of the analysis in the syntactic analysis unit, to learn a statistical model for estimating a word class. The learning result holding unit holds the result learned in the learning device. In the next analysis processing, the syntactic analysis unit estimates a word class making use of a learned result in the learning result holding unit to eliminate an ambiguity of the word class sequence at earlier stages.
Patent Document 1: JP-2-114377-A