As a method of distributed representation of a document, there is a Word2Vec technology that generates vectors from the document based on each of morphemes constituting the document to be analyzed. For example, the Word2Vec technology performs processing of calculating the vector value of each word based on relation between a certain word (morpheme) and another word adjacent to the certain word.
Here, when the Word2Vec technology or the like performs the distributed representation of a document using vectors, highly frequent words included in the document to be analyzed have an excessively great effect, the highly frequent words including articles such as “the,” “a,” and the like and prepositions such as “on,” “of,” and the like. Therefore, the Word2Vec technology generates the distributed representation using vectors after excluding the highly frequent words as stop words from the document.
For example, the Word2Vec technology excludes “of” as a stop word from a document to be analyzed, “He takes care of his daughter,” and then vectorizes each of the words included in “He takes care his daughter.”
Related technologies are disclosed in, for example, Japanese Laid-open Patent Publication No. 2006-48685, Japanese Laid-open Patent Publication No. 2009-151757, and Distributed Representations of Words and Phrases and their Compositionality, Tomas Mikolov et. al, pp. 3111-3119, Advances in Neural Information Processing Systems 26, 2013, Curran Associates, Inc.