1. Field of the Invention
The present invention relates to computers and computer networks. More particularly, the invention relates to speaker gender identification.
2. Background of the Related Art
Speech-based gender identification has many potential applications in speaker and speech recognition as well as multi-media signal analysis applications. Generally speaking, pitch period and MeI-Frequency Cepstral Coefficients (MFCC) are two most commonly used features in the existing gender identification system.
Good estimate of the pitch period can only be obtained for voiced portions of a clean non-noisy speech signal. Moreover, the overlap of the pitch values between male and female voices intrinsically limits the use of the pitch feature for gender identification. The average fundamental frequency (i.e., the reciprocal of pitch period) for men generally falls between (140) and 146 Hz, whereas the average fundamental frequency for women is usually between 188 and 221 Hz.
Furthermore, MFCC based gender identification typically requires high computation complexity and is sensitive to speech recording condition such as the noise condition, the microphone condition, etc. If the speech samples used for training and testing are recorded in different environments or with different microphones, MFCC feature may fail to work.
The pitch feature and MFCC feature have been combined to improve the performance of the gender identification system. However, due to the intrinsic drawbacks of the two features, the existing systems continue to encounter problems in gender identification performance, computation complexity, and recording condition sensitivity.