Emotional speech processing is important for many applications including user interfaces, games, and many more. However, it is very challenging to handle emotional speech. For example, emotional speech characteristics significantly differ from read/conversational speech and hence statistical voice recognition models trained with read speech perform poorly when emotional speech is encountered. Also, emotion recognition is difficult since different speakers have different ways of conveying their emotions and hence classes are ambiguous and hard to separate.
It is within this context that aspects of the present disclosure arise.