Emotion recognition or understanding the mood of the user is important and beneficial for many applications; including games, man-machine interface, etc. Emotion recognition is a challenging task due to the nature of the complexity of human emotion; hence automatic emotion recognition accuracy is very low. Some existing emotion recognition techniques use facial features or acoustic cues alone or in combination. Other systems use body gesture recognition alone. Most multi-modal emotion recognition involves facial recognition and some cues from speech. The recognition accuracy depends on the number of emotion categories to be recognized, how distinct they are from each other, and cues employed for emotion recognition. For example, it turns out that happiness and anger are very easily confused when emotion recognition is based on acoustic cues alone. Although recognition tends to improve with additional modalities (e.g., facial cues combined with acoustic cues), even with only about 8 emotional categories to choose from most existing systems are lucky to achieve 40-50% recognition accuracy.
It is within this context that aspects of the present disclosure arise.