The disclosure relates to an information processing apparatus, an information processing method and a program, and in particular, to an information processing apparatus, an information processing method and a program enabling extraction of profound text in which knowledge about attention targets, such as persons, content and thoughts, is stated from a huge amount of documents.
In the related art, attempts to obtain knowledge by statistically analyzing (performing statistical natural language processing on) a huge amount of documents have been widely made. For example, in a specialized field in which no thesaurus has been built, a thesaurus of the specialized field is automatically built by performing statistical natural language processing on documents in the specialized field.
In statistical natural language processing, a feature quantity of context information (indicating a word group consisting of a word attracting attention in a document and a predetermined number of words present before and after the word) is frequently used. Also, by calculating the degree of similarity in feature quantity of context information, analysis of synonyms of the word attracting attention, analysis of a polysemy, analysis of a relationship between two nouns, analysis of modality of a word, and the like are performed. For example, in the document “Discovering Relations among Named Entities from Large Corpora” by Takaaki Hasegawa, Satoshi Sekine and Ralph Grishman, in Proceedings of the Conference of the Association for Computational Linguistics 2004, a feature quantity of context information is used in synonym analysis of the relationship of a proper noun.