Firstly, as an example of a task for a natural language processing to perform structure prediction, a mapping from a word sequence to a Part-of-Speech (POS) tag sequence, a mapping from the word sequence to a phrase sequence and a mapping from the word sequence to a proper-noun sequence will be explained in sequence.
FIG. 1 illustrates a state of the mapping from the word sequence to the POS tag sequence. A word sequence 11 in this example includes words “Taro” “Yamada” and “signs” in this order. Then, FIG. 1 also illustrates a state that a POS tag such as noun (in the figure, noted by “N”) or verb (in the figure, noted by “V”) is correlated with the word. In this example, “noun” included in the POS tag sequence 13 is correlated with “Taro” included in the word sequence 11, “noun” included in the POS tag sequence 13 is correlated with “Yamada” included in the word sequence 11, and “verb” included in the POS tag sequence 13 is correlated with “signs” included in the word sequence 11.
FIG. 2 illustrates a state of the mapping from the word sequence to the phrase sequence. The word sequence 11 is the same as that in FIG. 1. Then, FIG. 2 illustrates a state that phrases are extracted from this word sequence, and a noun phrase (in the figure, noted by “NP”) or a verb phrase (in the figure, noted by “VP”) is correlated to each of the phrases. In this example, a first phrase “Taro Yamada” and a second phrase “signs” in the phrase sequence 21 are extracted, and “noun phrase” is correlated with the first phrase, and “verb phrase” is correlated with the second phrase.
FIG. 3 illustrates a state of the mapping from the word sequence to the proper-noun sequence. The word sequence 11 is the same as that in FIG. 1. Then, FIG. 3 illustrates a state that a person's name (in the figure, noted by “P”) or other (in the figure, noted by “O”) is correlated with a word or phrase included in this word sequence. In the figure, as depicted in the proper-noun sequence 31, the phrase “Taro Yamada” is determined to be the person's name, and “signs” is determined to be a word other than the person's name.
As an implementation method of these tasks, a supervised learning method has been applied. In this supervised learning method, the aforementioned word sequence and its correct structure (e.g. a label sequence) are given as training data, and by using this training data, the learning is performed so that the word sequence is correctly mapped to the structure. For example, in case of a method in which a classifier is combined, by assigning a label to each word, it becomes possible to determine the final output.
FIG. 4 illustrates a state of a mapping by the classifier. The word sequence 11 is the same as that in FIG. 1. In this example, in the learning, the classifier for assigning a label to a word is used to correlate a label with each word included in the word sequence to be processed.
In this example, 4 labels are used, in other words, “the forefront of the noun phrase” (in the figure, noted by “B-NP”), “a word other than the forefront of the noun phrase” (in the figure, noted by “I-NP”), “the forefront of the verb phrase” (in the figure, noted by “B-VP”) and “a word other than the forefront of the verb phrase” (in the figure, noted by “I-VP”).
When “the forefront of the noun phrase” is followed by “the forefront of the noun phrase” or “the forefront of the verb phrase”, it means that a word corresponding to the foregoing “forefront of the noun phrase” solely corresponds to the noun phrase.
When “the forefront of the noun phrase” is followed by one or plural “words other than the forefront of the noun phrase”, it means that a phrase including words from a word corresponding to “the forefront of the noun phrase” to a word corresponding to the extreme rear of “the words other than the forefront of the noun phrase” corresponds to the noun phrase.
When “the forefront of the verb phrase” is followed by “the forefront of the noun phrase” or “the forefront of the verb phrase”, it means that a word corresponding to the foregoing “forefront of the verb phrase” solely corresponds to the verb phrase.
When “the forefront of the verb phrase” is followed by one or plural “words other than the forefront of the verb phrase”, it means that a phrase including words from a word corresponding to the foregoing “forefront of the verb phrase” to the extreme rear of “words other than the forefront of the verb phrase” corresponds to the verb phrase.
In this example, as depicted by the label sequence 41, “the forefront of the noun phrase” is assigned to “Taro”, “the word other than the forefront of the noun phrase” is assigned to “Yamada”, and “the forefront of the verb phrase” is assigned to “signs”. As a result, as depicted by the phase sequence 43, “Taro Yamada” is determined to be the noun phrase, and “signs” is determined to be the verb phrase.
Moreover, recently, a structured learning method for directly predicting the structure is also used. FIG. 5 illustrates a state of the mapping by the structured learning method. In this example, the learning is performed for a mechanism for directly selecting a correct label sequence among candidates 51 of the label sequences obtained by selectively combining 4 kinds of labels for the respective words. The selection of the label sequence corresponds to selecting a correct path from among paths that connect between the labels for the respective words as illustrated in FIG. 5.
In this example, a label sequence including “the forefront of the noun phrase” for “Taro”, “the word other than the forefront of the noun phrase” for “Yamada” and “the forefront of the verb phrase” for “signs” is selected, and as a result, as depicted by the phrase sequence 53, “Taro Yamada” is determined to be the noun phrase, and “signs” is determined to be the verb phrase.
FIG. 6 illustrates a state of the mapping by another structured learning method. The word sequence 11 is the same as that in FIG. 1. This example uses, as a unit, a chunk that is a collection of words. In this method, a mechanism is learned to directly select a correct label sequence from among candidates 61 of the label sequences obtained by selectively combining 4 kinds of labels for the chunks included in the word sequence. In other words, the selection of the label sequence corresponds to selecting a correct path from among paths that connect between labels for the respective chunks as illustrated in FIG. 6. Moreover, assuming that the word sequence becomes one chunk, the label of that chunk may be selected.
In this example, as depicted by the phrase sequence 63, a label sequence including the noun phrase for “Taro Yamada” and the verb phrase for “signs” is selected.
In addition to these learning methods, in order to further improve the determination accuracy, there is an example in which an ensemble learning method is employed. In the boosting method that is one kind of the ensemble learning methods, plural models (also called “rule”) is learned to generate a combined model (or learning model) whose accuracy is high by combining those models. A learner for learning plural models is called “a weak leaner”, and a model that is learned by that weak leaner are called “a weak hypothesis”.
In the boosting method, a weight is set for each training sample included in the training data. Then, adjustment is performed so as to set a lighter weight for an easy training sample for which a correct prediction result is obtained by the weak hypothesis, and so as to set a heavier weight for a difficult training sample for which a correct prediction result is not obtained by the weak hypothesis. By adjusting the weight of each training sample as described above, it is expected to obtain a combined model (or learning model) that is conformable to various kinds of training samples.
According to a certain example that adopts the boosting method for the structure prediction, the classifier for assigning the label as described above is used as the weak leaner.    Patent Document 1: Japanese Laid-open Patent Publication No. 2010-33213    Non-Patent Document 1: Schapire, R. E. and Singer, Y.: “BoosTexter: Aboosting-based system for text categorization”, Machine Learning, Vol. 39(⅔), pp. 135-168 (2000)    Non-Patent Document 2: Nagata, M.: “A Stochastic Japanese Morphological Analyzer Using a Forward-DP Backward-A* N-Best Search Algorithm”, COLING, pp. 201-207 (1994)    Non-Patent Document 3: Schapire, R. E. and Singer, Y.: “Improved Boosting Algorithms Using Confidence-rated Predictions”, Machine Learning, Vol. 37, No. 3, pp. 297-336 (1999)    Non-Patent Document 4: Cohen, W. W. and Sarawagi, S.: “Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods”, Proc. of KDD'04, pp. 89-98 (2004)    Non-Patent Document 5: Sarawagi, S. and Cohen, W. W.: “Semi-Markov Conditional Random Fields for Information Extraction”, Proc. of NIPS'04 (2004)
However, according to the conventional art, it is difficult to improve the accuracy of the structured learning.