1. Field of the Invention
The present invention relates to methods and systems for detecting and measuring stress in speech.
2. Description of the Related Art
While presently-available automatic speech recognition (ASR) technology may be adequate for some commercial use, military, emergency medical and psychiatric applications are limited by lack of capacity to adjust for changes in the voice spectrum induced by ambient noise, multiple speakers or emotion-driven or task-induced psychological stress in the speaker. In harsh military scenarios where voice interaction with computers is needed for command and control, task workload and emotional stress has been shown to significantly impact military speech technology. See Hansen, J. H. L. et al. (2000) NATO Research & Technology Organization RTO-TR-10, AC/323(IST)TP/5 IST/TG-01. The events of 11 Sep. 2001 have dramatically shown the need for methods and systems that utilize voice data and information to account for stress and emotion in speech in order to effectively and accurately coordinate the deployment of civilian and military first-responders. Additionally, there is an increasing need to detect, measure, or account for stress and other emotions in speech during voice communications in order to detect deception or emotions such as stress or anxiety in civilian and military situations such as law enforcement interrogations.
Studies have shown that the performance of speech recognition algorithms severely degrade due to the presence of task and emotional induced stress in adverse conditions. Published studies have suggested that psychological stress causes an increase in the fundamental frequency (Fo) of the voice, as well as a decrease in the FM component of Fo. See Brenner, M., et al. (1979) Psychophysiology 16(4):351-357; Brenner, M., et al. (1994) Aviation, Space and Environmental Medicine 65:21-26; VanDercar, D. H., et al. (1980) Journal of Forensic Sciences 25:174-188; and Williams, C. and Stevens, K. (1972) J. Acoustical Soc. Amer. 52(4):1238-1250. Prior art methods for detecting stress in speech are based on pitch structure, duration, intensity, glottal characteristics, and vocal tract spectral structure using detection or classification methods based on Hidden Markov Models (HMM) or Bayesian Hypothesis testing. See Zhou, G. et al. (2001) IEEE Trans. Speech & Audio Process 9(3):201-216; Hansen, J. H. L. et al. (1996) IEEE Trans. Speech Audio Process., 4(4):307-313, 1996; and Caims, D. A. et al. (1994) J. Acoust. Soc. Am., 96(6):3392-3400.
Detecting stress in speech using the prior art methods, however, has been problematic and unreliable. See Cestaro, V., et al. (1998) Society for Psychophysiological Research, Annual Meeting. Specifically, reliable detection of stress, even in clean speech, is challenging as reliable detection requires that a speaker change the neutral speech production process in a consistent manner so that given features extracted can be used to detect and perhaps quantify the change in the speech production process. Unfortunately, speakers are not always consistent in how they convey stress or emotion in speech.
Recently, a new feature based on the Teager Energy Operator (TEO), TEO-CB-AutoEnv, was proposed and found to be more responsive to speech under stress. See Zhou, G. et al. (2001) IEEE Trans. Speech & Audio Process 9(3):201-216. However, use of the TEO-CB-AutoEnv feature with an HMM trained stressed speech classifier still provides high error rates of 22.5% and 13% for some types of stress and neutral speech detection, respectively. Additionally, the failure of published and commercially-available methods and systems to consistently measure psychological stress may be due to a lack of access to a robust, yet ethical human stress model as well as a failure to utilize validated stress markers.
Therefore, a need still exists for methods and systems for detecting and measuring stress in speech as well as human stress models for detecting and measuring stress in speech.