Using lists of positive and negative keywords can give the beginnings of a sentiment classification system. However, classifying sentiment on the basis of individual words can give misleading results because atomic sentiment carriers can be modified (weakened, strengthened, or reversed) based on lexical, discoursal, or paralinguistic contextual operators. The skilled person will appreciate that, in a portion of natural language, an atomic sentiment carrier (or an atomic (sub)context) is a constituent of that natural language that cannot be analysed any further for sentiment.
Past attempts to deal with this phenomenon include writing heuristic rules to look out for negatives and other ‘changing’ words, combining the scores of individual positive and negative word frequencies, and training a classifier on a set of contextual features. While statistical sentiment classifiers work well with a sufficiently large input (e.g. a 750-word movie review), smaller subsentential text units such as individual clauses or noun phrases pose a challenge. It is such low-level units that are needed for accurate entity-level sentiment analysis to assign (local) polarities to individual mentions of people, for example.
Known systems are described in documents such as US2009/0077069. However, such systems tend to be based upon fixed frames, templates or the like into which words and syntactic structures must be allocated in order for the analysis to progress. As such, these limited systems are not as flexible or as useful as may be desired.
The ability to detect author sentiment towards various entities in text is a goal in sentiment analysis, and has many applications. Entities, which can comprise anything from mentions of people or organisations to concrete or even abstract objects, condition what a text is ultimately about. Besides the intrinsic value of entity scoring, the success of document- and sentence-level analysis is also decided by how accurately entities in them can be modelled. Deep entity analysis presents the most difficult challenges, be they linguistic or computational. One of the most recent developments in the area—compositional semantics—has shown potential for sentence- and expression-level analysis in both logic-oriented and machine learning-oriented paradigms.
Entity-level approaches have so far involved relatively shallow methods which presuppose some pre-given topic or entity of relevance to be classified or scored. Other proposals have attempted specific semantic sentiment roles such as evident sentiment HOLDERs, SOURCEs, TARGETs, or EXPERIENCERs. What characterises these approaches is that only a few specific entities in text are analysed while all others are left unanalysed. While shallow approaches can capture some amount of explicitly expressed sentiment, they ignore all layers of implicit sentiment pertaining to a multitude of other entities.
One prior art paper discussing an example of deep level multi sentiment analysis is: Karo Moilanen and Stephen Pulman. (2009). Multi-entity Sentiment Scoring. In Proceedings of Recent Advances in Natural LANGUAGE Processing (RANLP 2009). September 14-16, Borovets, Bulgaria. pp. 258-263.