1. Field of the Invention
The invention relates to a system and an interactive method for detecting and processing prosodic elements of speech based user inputs and queries presented over a distributed network such as the Internet or local intranet. The system has particular applicability to such applications as remote learning, e-commerce, technical e-support services, Internet searching, etc.
2. Description of Related Art
Emotion is an integral component of human speech and prosody is the principal way it is communicated. Prosody—the rhythmic and melodic qualities of speech that are used to convey emphasis, intent, attitude and semantic meaning, is a key component in the recovery of the speaker's communication and expression embedded in his or hers speech utterance. Detection of prosody and emotional content in speech is known in the art, and is discussed for example in the following representative references which are incorporated by reference herein: U.S. Pat. No. 6,173,260 to Slaney; U.S. Pat. No. 6,496,799 to Pickering; U.S. Pat. No. 6,873,953 to Lenning; U.S. Publication No. 2005/0060158 to Endo et al.; 2004/0148172 to Cohen et al; U.S. Publication No. 2002/0147581 to Shriberg et al.; and U.S. Publication No. 2005/0182625 to Azara et al. Training of emotion modelers is also known as set out for example in the following also incorporated by reference herein:    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees, Chapman & Hall, New York, 1984.    Schlosberg, H., A scale for the judgment of facial expressions, J of Experimental Psychology, 29, 1954, pages 497-510.    Plutchik, R., The Psychology and Biology of Emotion, Harper Collins, New York 1994.    Russell, J. A., How shall an Emotion be called, in R. Plutchik & H. Conte (editors), Circumplex Models of Personality and Emotion, Washington, APA, 1997.    Whissell, C., The Dictionary of Affect in Language, in R. Plutchik & H. Kellerman, Editors, Emotion: Theory, Research & Experience, Vol. 4, Academic Press, New York 1959.    ‘FEELTRACE’: An Instrument for Recording Perceived Emotion in Real Time, Ellen Douglas-Cowie, Roddy Cowie, Marc Schröder: Proceedings of the ISCA Workshop on Speech and Emotion: A Conceptual Framework for Research Pages 19-24, Textflow, Belfast, 2000.    Silverman, K., Beckman, M., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J. & Hirschberg, J. (1992), A standard for labelling english prosody, in ‘Proceedings of the International Conference on Spoken Language Processing (ICSLP)’, Vol. 2, Banff, pp. 867-870.    Shriberg, E., Taylor, P., Bates, R., Stolcke, A., Ries, K., Jurafsky, D., Coccaro, N., Martin, R., Meteer, M. & Ess-Dykema, C. (1998), ‘Can prosody aid the automatic classification of dialog acts in conversational speech?’, Language and Speech, 41(3-4), 439-487.    Grosz, B. & Hirshberg, J. (1992), Some intonational characteristics of discourse structure, in ‘Proceedings of the International Conference on Spoken Language Processing’, Banff, Canada, pp. 429-432.    Grosz, B. & Sidner, C. (1986), ‘Attention, intentions, and the structure of discourse’, Computational Linguistics 12, 175-204. P. Boersma, D. Weenink, PRAAT, Doing Phonetics by Computer, Institute of Phonetic Sciences, University of Amsterdam, Netherlands, 2004, http://www.praat.org    Taylor, P., R. Caley, A. W. Black and S. King, Chapter 10, Classification and Regression Trees, Edinburgh Speech Tools Library, System Documentation, Edition 1.2, http://festvox.org/docs/speech_tools-1.2.0/c16616.htm Centre for Speech Technology, Univ. of Edinburgh, (2003)    Beckman, M. E. & G. Ayers Elam, (1997): Guidelines for ToBI labelling, version 3. The Ohio State University Research Foundation, http://www.ling.ohio-state.edu/research/phonetics/E_ToBI/
Conversely, real-time speech and natural language recognition systems are also known in the art, as depicted in Applicant's prior patents, including U.S. Pat. No. 6,615,172 which is also incorporated by reference herein. Because of the significant benefits offered by prosodic elements in identifying a meaning of speech utterances (as well as other human input), it would be clearly desirable to integrate such features within the aforementioned Bennett et al. speech recognition/natural language processing architectures. Nonetheless, to do this, a prosodic analyzer must also operate in real-time and be distributable across a client/server architecture. Furthermore, to improve performance, a prosodic analyzer should be trained/calibrated in advance.