Speech dialog systems are useful in a variety of contexts, where desirable fields for their implementation continue to grow. A speech dialog system (e.g., an automatic call routing system, an interview pre-screening system) captures audio responses from a person interacting with the speech dialog system and extracts content from those audio responses (e.g., via automatic speech recognition). The speech dialog system provides responsive output based on that extracted content, resulting in a conversation between the person and the speech dialog system (e.g., an avatar depicted on a screen, a voice transmitted over a telephone line).
It is often desirable to measure a level of engagement of the person interacting with the speech dialog system. That engagement level can be useful for gauging the level of effort being given by the person in interacting with the system (e.g., in a job interview pre-screening implementation). Or the engagement level can be used to adjust the spoken dialog system to increase the engagement level, either during the conversation or after the conversation so that future conversations achieve a higher level of engagement. The ability to measure a user experience and performance metrics for a spoken dialog system, either at the time of rollout or for a mature system, is important. For example, it can be especially important for spoken dialog systems used in the educational domain, where language learning and assessment applications require systems that deal gracefully with nonnative speech and varying cultural contexts.