Speech analysis is a general term given to computerized methods that process human speech utterances in order to uncover the information they carry. Speech analysis is classified as part of the phonetics discipline within the linguistic sciences.
Speech analysis may be divided into two main approaches: The first approach is focused on revealing the content of the speech by learning how words, syllables and phonemes are pronounced and how sentences are arranged. Many of the speech recognition applications, such as ‘speech to text’ or ‘word spotting’ use this approach in order to extract the content of the speech.
The second approach, analysis of speech prosody, focuses on the manner in which the speech is spoken, by analyzing the non-segmental (non-words, non-content) features of the speech such as intonation, tempo, intensity, stressing and rhythm.
The manner in which the speech is spoken provides the speech “melody”, which adds significant insight to the overall meaning and context of the speech. For example, people perceive a sentence as a question sentence by the rising intonation at the end of the sentence. Accents are a classic example of how prosodic speech parameters alter the pronunciation of words. Actors modify the prosody of their speech to sound like a certain character. There are also gender differences in prosody, for example, females typically having higher frequencies than males. Many times, prosodic features change the meaning of the speech: We interpret the same sentence as having a cynical, sarcastic or simple meaning by changes in the manner (prosody) in which it is pronounced.
Prosodic characteristics of the speech utterances carry information also about the emotional state of the speaker. This has been shown over the years by many works in the literature. It is also intuitively grasped—an excited person calling an emergency service would typically have a fast speech pace, his voice would be intense, breathy, fluctuated etc. On the other hand, a person in a sad, gloomy state would tend to speak slowly, with low energy, long pauses and the like. These characteristics are reflected in the prosodic speech features.
Several patents have been issued in respect to using prosodic speech features in order to automatically analyze the emotional state of speakers. Among them, U.S. Pat. No. 6,151,571 to Pertrushin, U.S. Pat. No. 6,173,260 to Slaney, for classifying different emotional states, and European Patent No. EP 1423846 to Degani and Zamir, for determining emotional arousal in general. All patents are incorporated by reference herein it their entirety.
Few attempts were made to correlate between the speaker's personality and the prosodic characteristics of speech. U.S. Pat. No. 6,006,188 to Bogdashevsky describes a method of determining the speech features of people with similar personality types (according to known psychological inventories), and then using the detected features for automatic classification of personality types. The idea that personality is reflected in the speech makes sense and can also be grasped intuitively: one can imagine, for example, the soft and hesitated speech patterns of an introverted person as opposed to the loud and impulsive speech patterns of an extroverted person.
Linking personality and speech patterns reflects well the understanding that speech expresses wide aspects of personal characteristics. However, the weakness of this concept lies in its practical implementation. Personality represents steady characteristics of a person; therefore it should also be measured rather consistently. This means that the speaker's speech patterns, reflecting his personality, should remain fairly consistent over changing situations, varying inner states and different contexts. This is not the case in reality: speech patterns tend to be strongly affected by situational factors, as evident, for example, from the proved, firm relation between emotional states and speech prosody. Even the introverted person from the abovementioned example gets angry from time to time, and when he does his speech patterns (and his prosodic speech parameters) change significantly and resemble much more to the speech patterns of an outgoing, extroverted person. Perhaps statistically, if we would measure the introverted person's speech patterns in many different occasions, there would be a significant correlation between his personality and his speech patterns. It may also be reliable if we sample a person's speech under very similar conditions to the conditions in which the reference data, representing the speech patterns of a certain personality, was taken. But this wouldn't be the case when an intervening situational factor will be dominant. In real-life, situational factors are frequently affecting the speech. Therefore, personality measurement through speech prosody can not be perceived as a situation-independent method.