In a typical voice-based call center, call conference, or remote training system, it can be difficult to directly gauge the emotional state of the participants unless they speak out directly to express their concerns (e.g., “I don't understand”). For example, in a dynamic session, it may be valuable to a presenter or speaker to be able to assess whether participants are confused, agree/disagree, absorbing the content, bored, etc. A dynamic session is any session in which a person is talking or reacting in response to stimuli, such as a training session, request for information, request for help, automated questions, etc. It may be advantageous for the presenter to get an overall perspective of the participants' mental or emotional reactions so that he/she can improve the presentation during the actual presentation in real time, such as repeating parts of a current topic, restating portions for better impact, moving on to another topic, challenging the participants to respond or focus on a subject, referring the participant(s) to specific references, etc.
Recognition of emotional state may be advantageous in other scenarios, such as self-service voice applications, which are typically deployed in call centers. If the emotional state of the caller is not available, an automated system may not be equipped to react to caller emotions, such as when a caller becomes frustrated with the system, is in an emergency, or is in a situation requiring urgency.