It has long been known that information regarding the psychological characteristics of an individual may be carried in the acoustic signal of that individual's speech (K. Scherer, “Vocal communication of emotion: a review of research paradigms,” Speech Communication, volume 40, pp. 227-256, 2003). Speech itself, moreover, has been understood as a product of linguistic transformational structures employed by the individual speaker (J. Piaget, Structuralism, Basic Books, 1970, pp. 5, 10, 15; R.Jakobson, Studies on Child Language and Aphasia, Mouton, 1971, pp. 7, 12, 20). Studies of psychoacoustics, however, have neglected to examine the continuous and simultaneous changes of multiple features of the acoustic signal, the acoustic transformational structures, that are generated by an individual in the act of speaking. Acoustic correlates of depressive states, for example, have been sought in summary statistics of a single specific acoustic feature, such as the mean rate of change of the fundamental frequency (Mn delta F0) of an utterance, without regard to the behavior of other acoustic features that accompany F0 in the acoustic signal (A. Nilsonne, “Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression,” Journal of the Acoustical Society of America, volume 83, number 2, pp. 716-728, 1988). Where several features are considered, weighted correlations with targeted mental states are derived from features considered independently rather than features contained in whole stuctures rendered by simultaneous feature values determined at identical analysis windows (K. Scherer, ibid.). These approaches fail to track the simultaneous variability of multiple acoustic features that distinguish any utterance, the acoustic transformational structures, and they therefore limit the observational scope for identifying specific feature-generating characteristics of the speaker. The result is an inadequate correlation of acoustic measurements with psychological characteristics (K. Scherer, ibid.).
In contrast to these methods, some techniques of acoustic analysis that are utilized in systems of speech recognition, speech synthesis, and emotion detection measure a variety of acoustic features at periodic intervals and compute the variability of multiple features. These computations, however, are incorporated with other, heterogeneous measurements into feature vectors that are then associated statistically with acoustic data that is selected and classified according to specific elements of speech content, such as specific phrases, words, morphemes, phonemes, diphones, prosodic features, or other distinctive elements (e.g. U.S. Pat. No. 7,337,114, Eide, Feb. 26, 2008; U.S. Pat. No. 7,337,107, Rose, Feb. 26, 2008; U.S. Pat. No. 7,280,968, Blass, Oct. 9, 2007; U.S. Pat. No. 6,173,260, Slaney, Jan. 9, 2001). These conglomerate vectors associated with specific linguistic elements do not constitute transformational structures, and, as a result, they are inadequate for identifying qualities, such as psychological qualities, that are intrinsic to the individual and prevail over the course of an utterance regardless of content.
What is needed is a method and system for rendering the simultaneous variability of multiple acoustic features generated in the course of an utterance that are independent of the specific content of that utterance. That is, what is needed is a method and system for rendering the acoustic transformational structures employed by the speaker. What is needed, further, is a method and system to describe and display these measurements in a manner that facilitates the elucidation of speaker characteristics.