In sentiment or opinion analysis, conventional approaches and applications currently in the market often produce too many incorrect results, partly due to the complexity in human language communications. One apparent problem with many conventional approaches is that words or phrases in user expressions are looked at without sufficient contextual analysis, due to the difficulties in performing such analysis and the lack of advanced natural language technologies.
For example, in identifying the sentiment type of the expression “Their price is pretty high”, many approaches may only look at the individual words in isolation, and identify the expression as reflecting a positive sentiment due to the presence of the word “pretty”, without also looking at the context of the word “pretty”, or without understanding the relationships between the words “price” and “high”, and between “pretty and “high”. Many systems also highlight words or phrases that are perceived to have either a positive or negative opinion or sentiment type for the purpose of better information presentation. However, without more advanced technologies and methods, the quality of the results is generally not up to the expectations yet, and the accuracy can often be too low to serve practical purposes.
Many words or phrases in a language carry positive or negative or neutral connotations, and can be used to express an opinion or feeling. For example, the word “good” usually carries a positive opinion, and the word “bad” usually carries a negative opinion. However, user expressions as linguistic units are not simple collections of individual words, and words or phrase that can carry either positive or negative opinions are not limited to such simple words like “good” or “bad”. Meanings or information carried in natural language contents have internal structures, and most of the time, the inherent meanings of individual words or phrase are changed in various context.
For example, to many users of the English language, the word “high” has an inherent positive connotation to a certain degree, such as when used in expressions like “the quality is high”, and the word “low” has an inherent negative connotation to a certain degree, such as when used in expressions like “the quality is low”. However, the inherent connotations of being either positive or negative can have an opposite manifestation under a different context. For example, in expressions like “high price”, the connotation of the phrase is usually perceived as being negative even though the word “high” has a positive inherent connotation or opinion type, as well as in other expressions like “high blood pressure”, or “high cholesterol”, etc.
Other more intriguing examples of different contexts changing the inherent connotations of a word or phrase can include expressions with the English word “prevent” or “prevention”. When used alone, such as in the name of a magazine named “Prevention”, or in expressions such as “prevent the disease”, the word “prevent” or “prevention” carries a positive connotation or meaning. However, there are cases where these words are used in different context such as in “The lack of resources prevented them from making timely progresses”, or “That condition prevented them from benefiting from the new policies”, etc.
Conventional approaches in sentiment analysis are not able to handle such contextual changes of opinion types or connotations. One example of such a problem with the conventional approach is exhibited when using an open source natural language processing tool kit such as NLTK (http://text-processing.com/demo/sentiment). At the time of this writing, the output of the tool kit for expressions like “the price is pretty high”, etc, consistently produce an opinion type of being positive, most likely due to the assumed positive connotation of the word of “high” or “pretty”, as is shown in FIG. 9. In a commercially available website (http://www.lexalytics.com/web-demo) where a sentiment analysis demo is available, the analysis for the same expression produced the same results, as is shown in FIG. 10. Furthermore, conventional approaches are often limited to a dictionary lookup method to retrieve the default sentiment type of a word or phrase and then use them as is in different expressions. Such conventional approaches are generally unable to perform the contextual analysis to accurately determine the true connotations or sentiment type of the expressions being analyzed due to the complexity of the internal structures of linguistic expressions.