Opinion analysis is concerned with extracting attitudes, beliefs, emotions, opinions, evaluations, and sentiment from digital texts. Opinion analysis has received much research attention lately, motivated by the desire to provide information analysis applications in the arenas of government, business, and politics. The types of applications desired include question answering (QA) applications, for example, that enable complex querying of the form, “How does entity X feel about topic Y?” Other desired applications include opinion-oriented information retrieval, clustering, opinion tracking, and document-, collection- and corpus-level opinion exploration. Enabling such applications requires an information extraction system that can extract opinion summary information from digital texts, such as news articles, blog postings, email (referred to herein generally as documents).
When a reader reads and analyzes a document, the reader is likely to process subjective language to determine the attitudes, beliefs, emotions, opinions, evaluations, sentiments or private state of mind of the author, or some other person referred to in the document, with respect to a particular topic or idea expressed in the document. Here, “private state” is used in a general sense to describe mental and emotional states that cannot be directly observed or verified. For example, the word “loves” in “John loves Mary” is an explicit mention of John's private state toward Mary. Similarly, the word “fears” in “Mary fears big cities” explicitly denotes a private state of Mary. There are also indirect expressions of private state: In the sentence “John said the book is a complete fabrication of the truth”, the phrase “a complete fabrication” divulges John's negative opinion about the book. For simplicity, throughout the remainder of this patent document, the term “opinion” is used to cover all types of private states expressed in subjective language.
While readers tend to naturally process subjective language when reading, automating the process to summarize and present opinions expressed in documents is a challenge. To date, much of the work done in the area of automating opinion analysis has been focused on the problem of identifying opinions at the document level. Document level analysis can be referred to as coarse-grained opinion analysis, where the goal is to determine whether the overall sentiment expressed in a single document is either positive (e.g., “thumbs up”), or negative (e.g., “thumbs down”).
The coarse-grained opinion analysis approach can be problematic for a number of reasons. First, a single article or text will often have multiple opinion expressions—with each opinion expression possibly associated with a different source (e.g., the opinion holder) and related to a different topic (e.g., the target or subject of the opinion). Consequently, expressing an overall summary at the document level, and attributing the opinion to one source (e.g., the author of the article) fails to recognize the existence of multiple opinion sources in a single article. For instance, a news article dealing with one or more political issues may express the differing opinions of two opposing politicians, thereby making it difficult to express the overall sentiment of the article in a single summary statement. Furthermore, expressing an overall opinion summary at the document level fails to recognize that the article may include a variety of opinion expressions on different topics. In a news article dealing with one or more political issues, the article may express the opinions of the opposing politicians on a variety of political issues. Therefore, expressing an overall opinion summary at the document level in a single opinion statement fails to reflect the various opinions on the different political issues. Consequently, the type and quality of information extracted from documents with course-grained opinion analysis will not enable the types of rich querying applications described above.