1. Field of the Invention
The present invention relates to speech recognition, and more particularly relates to an apparatus and method for speaker normalization based on biometrics.
2. Brief Description of the Prior Art
It has been well established in the field of automatic speech recognition that normalizing waveforms to account for the vocal tract differences among speakers yields more accurate results than can be obtained in systems which do not include such normalization. If an open vocal tract model is assumed, such as would be appropriate for an open vowel (for example/UH/), a uniform tube model provides a good approximation to the vocal tract, as discussed by Lawrence W. Rabiner and Ronald W. Schafer in the text Digital Processing of Speech Signals published by Prentice-Hall, Inc., Englewood Cliffs, N.J. 07632 in 1978.
In the uniform tube model, when one scales the tube by a factor 1/k, this results in a scaling of all of the resonances of the tube by k and, therefore, a linear scaling of the frequency axis is appropriate. In practice, linear scaling has been shown to be effective in normalizing for differences in vocal tract length. In implementation, once a form of frequency scaling (for example, linear scaling, f*=kf) has been chosen, the remaining question is how to determine a scale factor ki for each speaker i.
It has been known in the prior art to derive an estimated scale factor based on formant positions, as set forth in the paper xe2x80x9cA Parametric Approach to Vocal Tract Length Normalizationxe2x80x9d by Ellen Eide and Herbert Gish as published by the IEEE in the Proceedings of the ICASSP of 1996, at pages 346-48.
Other results have been published for general speech corpora based on exhaustive search, for example, refer to xe2x80x9cSpeaker Normalization Using Efficient Frequency Warping Proceduresxe2x80x9d by Li Lee and Richard C. Rose, as published at pages 353-56 of the aforementioned 1996 ICASSP Proceedings, and xe2x80x9cSpeaker Normalization on Conversational Telephone Speechxe2x80x9d by Steven Wegmann et al., as published at pages 339-41 of the aforementioned proceedings.
One case of interest is the situation where a database is available which contains biometric information in the form of a biometric parameter (such as speaker height) which would permit the normalization factor for each speaker to be computed by taking the ratio of the value of the speaker""s biometric parameters to some measure of an average value of the biometric parameter, such as the average across all speakers in the training database.
In view of the foregoing, there is a need in the prior art for a speaker normalization apparatus and method which are based on biometrics pertaining to the speaker.
The present invention, which addresses the needs identified in the prior art, provides a method of speaker normalization. The method includes the steps of receiving a first biometric parameter, calculating a first frequency scaling factor based on the first biometric parameter, and extracting acoustic features from speech of a user in accordance with the first frequency scaling factor. The first biometric parameter is correlated to vocal tract length of a given user of a speech recognition system.
The present invention further provides an apparatus for speaker normalization, which includes a biometric parameter module, a calculation module, and an acoustic feature extractor. The biometric parameter module receives the first biometric parameter which is correlated to the vocal tract length of the user. The calculation module calculates the first frequency scaling factor based on the first biometric parameter. The acoustic feature extractor extracts acoustic features from speech of the user in accordance with the first frequency scaling factor.
The present invention can be implemented in hardware, software, or a combination of hardware and software, and accordingly also encompasses a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for speaker normalization as set forth herein.
Accordingly, it will be appreciated that the method and apparatus of the present invention provide an improvement over prior-art approaches, inasmuch as an appropriate scaling factor can be readily determined, so as to improve the accuracy of an associated speech recognition system, based on biometric data pertaining to users of the system, which may be, for example, pre-stored, or which may be ascertained during an interaction with the speaker.