Although the first monograph on expression of emotions in animals and humans was written by Charles Darwin in the nineteenth century and psychologists have gradually accumulated knowledge in the field of emotion detection and voice recognition, it has attracted a new wave of interest recently by both psychologists and artificial intelligence specialists. There are several reasons for this renewed interest, including technological progress in recording, storing and processing audio and visual information, the development of non-intrusive sensors, the advent of wearable computers; and the urge to enrich human-computer interfaces from point-and-click to sense-and-feel. Further, a new field of research in Artificial Intelligence (AI) known as affective computing has recently been identified. Affective computing focuses research on computers and emotional states, combining information about human emotions with computing power to improve human-computer relationships.
As to research on recognizing emotions in speech, psychologists have done many experiments and suggested many theories. In addition, AI researchers have made contributions in the areas of emotional speech synthesis, recognition of emotions, and the use of agents for decoding and expressing emotions.
A closer look at how well people can recognize and portray emotions in speech is revealed in Tables 1–4. Thirty subjects of both genders recorded four short sentences with five different emotions (happiness, anger, sadness, fear, and neutral state or normal). Table 1 shows a performance confusion matrix, in which only the numbers on the diagonal match the intended (true) emotion with the detected (evaluated) emotion. The rows and the columns represent true and evaluated categories respectively. For example, the second row indicates that 11.9% of utterances that were portrayed as happy were evaluated as neutral (unemotional), 61.4% as truly happy, 10.1% as angry, 4.1% as sad, and 12.5% as afraid. The most easily recognizable category is anger (72.2%) and the least recognizable category is fear (49.5%). There is considerable confusion between sadness and fear, sadness and unemotional state, and happiness and fear. The mean accuracy of 63.5% (diagonal numbers divided by five) agrees with results of other experimental studies.
TABLE 1Performance Confusion MatrixCategoryNeutralHappyAngrySadAfraidTotalNeutral66.32.57.018.26.0100Happy11.961.410.14.112.5100Angry10.65.272.25.66.3100Sad11.81.04.768.314.3100Afraid11.89.45.124.249.5100
Table 2 shows statistics for evaluators for each emotional category and for summarized performance that was calculated as the sum of performances for each category. It can be seen that the variance for anger and sadness is much less then for the other emotional categories.
TABLE 2Evaluators' StatisticsCategoryMeanStd. Dev.MedianMinimumMaximumNeutral66.313.764.329.395.7Happy61.411.862.931.478.6Angry72.25.372.162.984.3Sad68.37.868.650.080.0Afraid49.513.351.422.168.6Total317.728.9314.3253.6355.7
Table 3 below shows statistics for “actors”, i.e. how well subjects portray emotions. Speaking more precisely, the table shows how readily a particular portrayed emotion is recognized by evaluators. It is interesting to compare tables 2 and 3 and see that the ability to portray emotions (total mean is 62.9%) at about the same level as the ability to recognize emotions (total mean is 63.2%). However, the variance for portraying and emotion is much larger.
TABLE 3Actors' StatisticsCategoryMeanStd. Dev.MedianMinimumMaximumNeutral65.116.468.526.189.1Happy59.821.166.32.291.3Angry71.124.578.213.0100.0Sad68.118.472.632.693.5Afraid49.718.648.917.488.0Total314.352.5315.2213445.7
Table 4 shows self-reference statistics, i.e. how well subjects were able to recognize their own portrayals. We can see that people do much better in recognizing their own emotions (mean is 80.0%), especially for anger (98.1%), sadness (80.0%) and fear (78.8%). Interestingly, fear was recognized better than happiness. Some subjects failed to recognize their own portrayals for happiness and the normal or neutral state.
TABLE 4Self-reference StatisticsCategoryMeanStd. Dev.MedianMinimumMaximumNeutral71.925.375.00.0100.0Happy71.233.075.00.0100.0Angry98.16.1100.075.0100.0Sad80.022.081.225.0100.0Afraid78.824.787.525.0100.0Total400.065.3412.5250.0500.0
These results provide valuable insight about human performance and can serve as a baseline for comparison to computer performance. In spite of the research on recognizing emotions in speech, little has been done to provide methods and apparatuses that utilize emotion recognition for business purposes.