In the sentence processing, it has been known to acquire the expression of words by using vectors of words that co-occur (appear at the same) in one sentence. According to one conventional technique of learning the word dispersion expression using the vectors, in clustering a polyseme having a plurality of meanings in the same surface layer, even the polyseme in the same surface layer is learnt as different words. Hereinafter, when the notation of the word is described without considering the meaning of the word, the notation may be expressed as “surface layers”.
For example, in one known technique, each word in an inputted predetermined sentence is extracted, and any word is selected as a core word to extract a core-word co-occurrence vector represented by a core-word co-occurrence word that has a co-occurrence relation with the core word, and the number of co-occurrences. According to the technique, the co-occurrence word concept of each core word co-occurrence word of the core-word co-occurrence vector is estimated from the general concept, and for a group of estimated co-occurrence word concepts, each core word co-occurrence word for a selected core word is clustered, based on the similarity between the co-occurrence word concepts. Further, according to the technique, when a plurality of clusters are present, a candidate for polyseme is extracted.
In another known technique, a sentence is inputted, statistics on words and the co-occurrence frequency of the words in a specific context are gathered, and the words are classified as a probability model estimation problem to output the word classification. According to the technique, a word automatic classification problem is regarded as the estimation problem of the probability model defined on a direct product of two word sets. According to the technique, using the information criterion, the probability model is selected from probability models that defines the occurrence probability of each word pair as the probability found by multiplying the occurrence probability of the cluster pair by the conditional probability of each word, and the two word sets are alternately clustered in a bottom-up manner.
Related techniques are disclosed in, for example, Japanese Laid-open Patent Publication Nos. 2013-020431 and 11-143875, Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation of Word Representations in Vector Space”, In Proceedings of Workshop at ICLR, 2013, Xu Chang et al. “Rc-net: A general framework for incorporating knowledge into word representations”, Proceeding of the 23rd ACM International Conference on Conference on Information and knowledge Management. ACM, 2014, Bengio, Yoshua, et al. “A neural probabilistic language model” Journal of machine learning research 3. Feb. (2003): 1137-1155, Guo, Jiang, et al. “Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources” COLING. 2014.