Human communication conveys important information not only about intent but also about desires and emotions as well. In particular, the importance of automatically recognizing emotions from human speech and other communication cues has grown with the increasing role of spoken language and gesture interfaces in human-computer interactions and computer mediated applications.
Current automatic emotion recognizers typically assign category labels to emotional states, such as “angry” or “sad,” relying on signal processing and pattern recognition techniques. Efforts involving human emotion recognition have mostly relied on mapping cues such as speech acoustics (for example energy and pitch) and/or facial expressions to some target emotion category or representation.
A major challenge to such approaches is that expressive human behavior is highly variable and depends on a number of factors. These factors may include the context and domain of the expressive behavior, and may be expressed through multiple channels. Therefore, categorical representations for emotions and simple pattern recognition schemes may not be adequate in describing real-life human emotions.
There is a need for methods and systems that provide a holistic and multi-tier approach to the problem of emotion recognition.