1. Field of the Invention
The present invention relates to a system and method of providing a more natural interaction between a human and a computing device by providing an improved method of detecting emotion.
2. Introduction
Some studies have been performed on the topic of understanding the affective component of human-machine communication. The affective component relates to understanding the emotion in speech. In normal conversation, people gather much information not just from the actual words spoken but also from how they are spoken. We learn from pitch, volume, intensity and so forth about the meaning of the words spoken by an individual.
In state-of-the-art spoken dialog systems, the dimension related to emotion is usually ignored though it plays a major role in engaging users in communicating with machines. Speech researchers are becoming increasingly interested in human emotion. There is an ever growing body of research pointing to useful indicators of emotional speech; most specifically, prosodic (pitch, energy, speaking rate) and lexical features. However, most of this research has used data elicited from actors. Notwithstanding, a few researchers have begun to look at emotions as they develop and evolve in more natural settings in spoken dialog systems. Currently, there are three open research issues in emotion processing.
Regarding emotion annotation, studies show that although researchers have created protocols for various degrees of emotion states, their distributions are very skewed and, more importantly, inter-labeler agreement is relatively low. In the area of emotion prediction, studies show that given the nature of the problem, there is not a dominant predictive feature and the studies present the use of very large feature sets that exhibit low correlations. Thus, researchers tend to reduce the problem to a binary decision (negative vs. positive state). Such a binary decision does not provide the depth of information necessary to improve the spoken dialog. Finally, a computational model of affective computing has been studied and should be able to predict the user's current state and act upon it. The action should move the user to the next internal state of the dialog which is most likely to lead to a successful dialog in terms of the dialog goal and the user (positive) state.
What is needed in the art is an improved system and method of improving a spoken dialog system according to user emotion.