The widespread availability of consumer generated media (CGM), such as blogs, message boards, comments on articles, Twitter and the world wide web in general, has created a substantial new medium that allows ordinary consumers to voice their views, positive, negative or neutral, about a product, service or idea. Existing work in the area of analyzing consumer sentiment analysis has typically focused on Semantic-based approaches.
Semantic-based approach generally relies on opinion work collection in the form of a “sentiment dictionary” or a large-scale knowledge base to assign “sentiments” to a document. For example, opinion words refer to those sentiments possessing positive or negative sentiments such as “awesome,” “great,” “love” or “terrible.” However, this type of approach is not optimal because it may not appropriately capture the attitude or sentiment of the writer if the word is used in an unexpected way or in an unconventional sense. For example, a blog which described an item as “one that I love to hate,” would probably be characterized as a “neutral,” rather than a “negative” sentiment because it contained the “positive” sentiment “love,” as well as the “negative” sentiment “hate.” Thus, existing sentiment analysis methods would average the sentiment of each statement and conclude that it was a neutral, rather than a negative statement. Typically, existing sentiment analysis systems rely on sentiment polarity. Sentiment polarity divides words into positive, negative or neutral categories. This analysis is useful, but lacks insight as to the drivers behind the sentiments. For example, “unexciting” would be a very negative comment to make about a roller coaster, but could very well be a positive comment about a dental procedure. Thus, existing methods do not always capture the true sentiment of web-based content.