1. Field of the Invention
This invention relates to computer text documents and, more particularly, to analyzing affect and emotion in the documents.
2. Description of the Prior Art
There are methods and apparatuses that model emotion and personality, synthesize emotional speech, and monitor physical manifestations of emotion (including changes in brain signals, facial expression, and motion). However, there is no prior art that analyzes and measures emotion and affect in text documents.
G. Collier has analyzed emotional expression. Collier, G., Emotional Expression, Lawrence Erlbaum and Associates, In., 1985. Collier focuses on the use of grammatical categories, such as the ratio of the number of verbs and adjectives, the use of past tense and negation, and changes in grammatical complexity, to assess a speaker""s emotional state. While Collier briefly discusses verbal immediacy, most of the work is on the use of adjectives to describe an emotional state. Almost all work at the intersection of emotion and text is focused on defining emotion words like xe2x80x9cfearxe2x80x9d and xe2x80x9csurprisexe2x80x9d and not on analyzing the emotional attitudes expressed in subtle fashion through text.
Text classification methods, like naxc3xafve Bayes, measure the probability of a word given that a document belongs to a class (positive, negative, or neutral). These methods do not consider the probability of a word""s absence. Also, these methods cannot correctly assign affect to documents that contain a mixture of affect terms (i.e., contain positive and negative affect terms). Moreover, there have been no attempts in the text classification literature to analyze affect.
The naxc3xafve Bayes method computes the probability that a document merits a particular class label based on a simple combination of the independent probabilities for each of the words in the document. However, this method will not work well for affect analysis because the expression of emotion in text is more complex. The assumption of independent probabilities required by this method fails to properly account for the way in which positive and negative affects combine, and so will not be effective in classifying text documents according to affect.
Some prior art text classification methods count the frequency and rarity of affective terms. However, the likelihood that a document is positive is not well-correlated with just the presence of positive affect terms, but also with the absence of negative affect terms.
For example, applying known text classification methods to the task of finding positive web pages about Barney the purple dinosaur is ineffective. Most such pages were written by Barney-bashers in strong, negative tones. Presumably, somebody who hates Barney would want to see the negative pages, and somebody who loves Barney would want to see the positive pages. But a search for xe2x80x9clove Barney purple dinosaurxe2x80x9d would yield overwhelmingly negative pages, because the word xe2x80x9clovexe2x80x9d does not discriminate between positive and negative pages. Although the word xe2x80x9clovexe2x80x9d is one of the most common positive affect terms on the positive pages, it is also the most common positive affect term on the negative pages and appears more frequently on the negative pages than on the positive pages. Moreover, the word xe2x80x9clovexe2x80x9d appears 50% more frequently than the word xe2x80x9chatexe2x80x9d on the negative pages. In fact, no positive affect term is effective at distinguishing positive Barney pages from negative pages. The most accurate method of distinguishing positive Barney pages from negative Barney pages is to look for Barney pages that include positive affect terms with a concurrent absence of negative affect terms.
None of the prior art concerns the classification of text documents according to affect. None of the prior art involves methods of analyzing affect in text, nor the identification of affect associated with each of the named entities mentioned in a text document. None of the prior art is capable of analyzing the subtle stylistic cues and influence that word choice applies to the emotional tone of a document.
In order to overcome the limitations of the prior art, I have developed a method and apparatus for analyzing affect and emotion in text documents.
Affect and emotion manifest themselves in text documents through subtle stylistic cues, such as changes in the choice of synonyms. For example, xe2x80x9cJohn crushed the competitionxe2x80x9d and xe2x80x9cJohn wonxe2x80x9d communicate the same information, but convey a different attitude about John""s role.
The present invention analyzes affect and emotion in text, reporting a valence (positive, negative, or neutral) and intensity (magnitude) for the text""s overall emotion and for the emotion associated with each named entity. The system can be used to classify news articles as good news or bad news, classify web pages on a topic as positive or negative, and classify customer communications into complaints and compliments. Other applications include the analysis of financial news for short-term prediction of the impact of the news on stock prices.
An embodiment of the present invention analyzes affect by computing a weighted sum of the scores for positive and negative affect terms (words and phrases), where the scores for negative affect terms are subtracted from the scores for positive affect terms. Possible scoring methods include the frequency of occurrence of an affect term or the frequency multiplied by a term intensity or magnitude (e.g., xe2x80x9cmaimxe2x80x9d and xe2x80x9ckillxe2x80x9d are more strongly negative than xe2x80x9churtxe2x80x9d). Negation of an affect term can either be ignored or used to invert the contribution from the negated affect terms.
Most affect terms have only a single affect value. However, the affect assigned to some terms may depend on the term""s part of speech. For example, the word xe2x80x9chitxe2x80x9d is positive as a modifier (xe2x80x9chit moviesxe2x80x9d), negative as a noun (xe2x80x9ctook a direct hitxe2x80x9d), and neutral (xe2x80x9chit a new 52-week highxe2x80x9d) or negative as a verb (xe2x80x9cJohn hit Maryxe2x80x9d). Thus, a part-of-speech tagger may be integrated with the affect analyzer. Likewise, the affect assigned to some terms may depend on the term""s word sense. For example, the word xe2x80x9cleadingxe2x80x9d has positive affect only when used to indicate prominence, not when used to refer to interline spacing.
Another embodiment of the present invention combines affect analysis with named entity extraction to assign an affect to each named entity mentioned in the document, in addition to assigning an affect value to the entire document. When assigning affect to named entities, the affect is assigned to the nearest named entity that is not xe2x80x9cblockedxe2x80x9d by other affect terms or named entities (terms between the affect term and the nearest named entity). The idea is that each mention of an affect term primes a positive or negative association in the reader""s mind which may influence the reader""s attitude to nearby named entities. But the affect decays rapidly, persisting only far enough to contribute to nearby named entities.
In a preferred embodiment, the direction of application of affect (e.g., before or after the named entity) is ignored. In another embodiment, the direction controls whether the affect is inverted for some affect terms. In another embodiment, the sentences are parsed and the affect from verbs is attached to the verb""s agents and objects, as appropriate, and likewise from modifiers to modified objects. The intention is to capture the notion that the victim of a bad act is pitied and so gets positive affect. (But, in practice, it seems that proximity or involvement in a bad act tarnishes even the victim of the bad act.)
The ability to analyze emotion in text has many important applications. It can be used to classify news articles as good or bad, web pages as positive or negative, and customer communications (correspondence and telephone calls) as complaints or compliments. For example, a web search engine could be modified to allow the user to search for web pages that are positive or negative on a topic. It can also measure the magnitude of the emotion, allowing such documents to be prioritized according to intensity. It can be used to gauge user frustration with a user interface.
Another application of the present invention is to classify the links to a web page as positive or negative depending on the affect of the anchor text or text in the sentence or paragraph that contains the link. Since a link can be considered a vote by one web page for or against another web page, the affect associated with a link can help determine whether the target web page is considered to be a good or bad page. This information can be used to improve the quality of the search results on a web search engine. If one web page is regarded more positively than another by the sites that link to it, then it should be ranked higher.
Since news articles about companies may have an impact on investor confidence in a company, the attitudes expressed in an article about the company may have a subtle impact on stock prices. Combining an analysis of news article affect with historical price movements may facilitate the short-term prediction of the impact of a news article on stock prices because such an apparatus for analyzing affect could operate more quickly than people can read the articles. The ability to predict price swings in securities, even short-term, can be extremely lucrative.