1. Field of the Invention
The present invention relates generally to the technology of classification of gender or age of a speaker, and more particularly to a method of recognizing gender or age of a speaker according to speech emotion or arousal.
2. Description of the Related Art
Identification based on human biometrics has become recent developing tendency. Compared with this technology, the conventional person verification based on integrated circuit (IC) cards or passwords may have the risks that IC cards are missing or passwords are stolen. As far as the commercial fingerprint-based identification is concerned, the resolution of the device based on such technology affects accuracy of the recognition, considering the contact and hygiene that a user needs to touch the sensor of the device, so it is still limited in operation. Analyzing personal biometrics, such as recognition of emotion, gender, and age, by means of voices and faces can provide higher convenience and more options for recognition and reduce the aforesaid risks.
U.S. Pat. No. 7,881,933 B2 disclosed a speech processing technology, which could recognize a speaker's age according to received speech signals via a signal processing instrument to figure out a confidence score, which indicated the result of age recognition.
U.S. Pat. No. 5,953,701 disclosed a gender recognition system, in which a preprocessor converted speech signals into acoustic data, the phone state model created and stored in the memory beforehand was employed for processing, analyzing, and determining the relevant phonetic state, and finally the result of gender recognition was yielded.
As known from above, the existing speech processing technology can recognize age and gender. However, most of speech information contains speaker's emotion or arousal. As the emotion or the arousal is different at the moment of speaking, the speech signals indicate different physical characteristics, so the result of the recognition is variable subject to different emotional modes or arousal degree. To date, none of any existing techniques can classify emotional conditions or analyze the arousal degree for recognition of age and gender by reference to the emotion or arousal in speech signals as auxiliary.