Sentiment prediction from textual data refers to the classification of each written sentence (or more generally, phrase, paragraph or document, depending on the level of granularity sought) into states such as one of three states: positive, negative, or neutral. This classification is intended to reflect any emotion(s) potentially conveyed in the text: for example, “joy” and “wonder” entail a positive outlook, while “sadness” or “fear” imply a negative feeling. Capturing this general perspective is essential when it comes to associated non-verbal clues. Increased emphasis on more natural human-computer interaction has sparked interest in the emotional aspect of communication, as evidenced by prosodic clues closely aligned with, and aimed to reinforce, textual content. Thus, sentiment prediction is a prerequisite for human-like text-to-speech synthesis, where emotional markers (in the form of suitable acoustic and prosodic parameters) need to be properly synthesized along with textual information in order to achieve the highest degree of naturalness.
Generally, only a small number of words have clear, unambiguous emotional meaning. For example, word “happy” may encapsulate joy, and word “sad” may encapsulate sadness. The vast majority of words, however, may carry multiple potential emotional connotations. For example, word “thrilling” may be a marker of joy or surprise. Word “awful” may convey sadness or disgust.
Emotional recognition systems are often hampered by the bias inherent in the underlying taxonomy of emotional states. Given that this taxonomy only supports simplified relationships between affective words and emotional categories, it often fails to meaningfully generalize beyond the few core terms explicitly considered in its construction.