The present invention relates to an expression extraction device, an expression extracting method, a program, and a recording medium. More specifically, the present invention relates to an expression extraction device, an expression extracting method, a program, and a recording medium for extracting evaluation expressions from text having descriptions on evaluations of a specific evaluation target, in which each evaluation expression is an expression indicating the evaluation of the evaluation target.
Along with diffusion of the Internet in recent years, customers and others have started disclosing evaluations of commodities, services, companies themselves, and the like by means of various message boards, evaluation sites, and the like on a network. As a consequence, such information on the network has been highly influential to sentiments of evaluation targets.
Under the circumstances, a sentiment analysis technique is drawing attentions. Here, sentiments are analyzed by acquiring pieces of text describing evaluations of a specific evaluation target such as a commodity, service or company from enormous amounts of information on the network, and then by analyzing those pieces of text (see, Tetsuya Nasukawa, et. al, “Sentiment Analysis: Capturing Favorability Using Natural Language Processing”, The Second International Conferences on Knowledge Capture (K-CAP 2003), October 2003, and Jeonghee Yi, et. al, “Sentiment Analyzer: Extracting of Sentiments towards a Given Topic using NLP Techniques”, The Third IEEE International Conference on Data Mining (ICDM '03), November 2003 for example).
The sentiment analysis technique extracts the evaluation expressions which represent expressions indicating affirmative evaluations and/or expressions indicating negative evaluations from the text, and then analyzes the sentiments based on extraction results. Conventionally, a dictionary of the evaluation expressions which are the subject of extraction has been produced by manpower. However, the evaluation expressions vary widely and are different depending on the field of the evaluation target. Therefore, it has been difficult to produce such a dictionary including various evaluation expressions in various fields by manpower.
Accordingly, there is also disclosed a technique configured to extract evaluation expressions from text and to register the evaluation expressions with a dictionary after judging whether each of the evaluation expressions belongs to an affirmative expression or a negative expression.
Bo Pang, et. al, “Thumbs up? Sentiment classification using Machine Learning Techniques.”, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 79-86, 2002 discloses a method of learning a word strongly correlated with an evaluation value by using data which clearly show whether the entire text is affirmative or negative to the evaluation target, such as a movie review marked with a five-point scale.
Peter Turney, “Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews.”, In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), p. 417-424, 2002 discloses a method of measuring a negative degree or an affirmative degree of an evaluation expression from degrees of presence of a word such as “poor” or “excellent” in the vicinity of an evaluation expression in a document on the Internet by use of a search engine on the Internet.
Vasileios Hatzivassiloglou, et al, “Predicting the semantic orientation of adjectives.”, In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL, p. 174-181, 1997 discloses a method of learning polarities of words cooccurring in parallel phrases conjoined by a conjunction such as “and”, “or”, or “but”, while defining an affirmative evaluation as a positive polarity and a negative evaluation as a negative polarity. In other words, the method learns the polarities of words such that the words conjoined by “and” or “or” share the same polarity and that the words conjoined by “but” are of the mutually reverse polarities.
With regard to Bo Pang, et. al, “Thumbs up? Sentiment classification using Machine Learning Techniques.”, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 79-86, 2002, it is necessary that the entire document be evidently affirmative or negative to the evaluation target. Accordingly, applicable documents are limited.
With regard to Peter Turney, “Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews.”, In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), p. 417-424, 2002, it is necessary to search the respective evaluation expressions included in the text by use of a search engine. Accordingly, this technique is low in processing efficiency and difficult to obtain an absolute evaluation result due to dependency on the contents of the document subject to searching.
With regard to Vasileios Hatzivassiloglou, et al, “Predicting the semantic orientation of adjectives.”, In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL, p. 174-181, 1997, it is necessary that an evaluation expression as a target for judging a polarity be written as a parallel phase. Therefore, applications of this technique are limited.