When a human being is speaking, the listener receives impressions and signals beyond what is actually uttered, i.e. the objective contents of the uttered words. These additional impressions and signals assist the listener in interpreting the factual contents of the uttered words and they also lead to a conscious or unconscious judgement of the speaker's credibility, mood and so on.
Such additional signals could be for instance the tempo used by the speaker, i.e. the speed with which he utters the words, and the rhythm he uses. Also, the pitch of the voice communicates some information, and for example deep and dark bass voices often are perceived as confidence-inspiring and soothing.
The human speech contains one fundamental tone and a number of higher-pitched overtones. Thus, the fundamental note is the lowest frequency that is perceivable at any given time, and equipment for measuring the fundamental notes of speech and song is already known. From EP 0 821 345 and U.S. Pat. No. 6,014,617, for example, identification of the notes of human speech is already known.
In addition, it is already known that the fundamental notes of speech change progressively, and usually such change is governed by the context, i.e. the contents of the speech and the environment in which the speech is made. Attempts have also been made to re-create such context-dependent variations in speech syntheses. This phenomenon is described for instance in EP 0 674 307.
In addition, the speaker's body language emits signals to the listener.
However, much information communicated via the human speech is not perceived consciously, and therefore cannot be analysed. Consequently, there is a need for means, such as methods and devices, for improved speech analyses and/or analyses of further aspects of the speech.