The invention relates to a method for processing speech, in particular to a method for emotion recognition and speaker identification.
In many systems with man-machine interfaces (MMI) it is desirable to integrate as much information as possible that can be derived from the various communication channels used by humans. In particular, it is often useful to include emotional information that describe the emotions of a user of a system, i.e. for example if the user is angry, happy, or sad. This emotional information may be derived from a speech signal of the user and can then be used e.g. to generate a respective response of the system. An example for a system, where emotional information can be useful, is an automatic teller machine (ATM) which is speech operated. If the user gets annoyed by the system, because the system has e.g. asked the user to repeat an order several times, he may get impatient. This emotional state may be detected by the system and thus the system's input mode may switch from speech to graphic/haptic input via a touch screen.
Another important point of today's MMI systems is the identification of speakers. In many systems it is important to know who is interacting with the system. For example, several people may share a car and certain parameters of the system may be set dependent on the current driver. It is therefore necessary that the driver be identified, which is commonly achieved by a speaker identification routine within the MMI system.