1. Field of the Invention
The present invention relates generally to speech recognition and more specifically to performing speech recognition based on precomputed speaker normalization factors stored in codebooks.
2. Introduction
Currently, sensitivity to variable background environments, accents, dialects, speaker characteristics, channel environments, and recording conditions is a challenge to speech recognition systems. Such variables in combination with noisy conditions often cause the quality of speech recognition systems to deteriorate so far as to be unusable for certain applications. Speech recognition systems can be aided by normalizing speech, a process of estimating the vocal tract length of a speaker and adjusting the speech recognition based on the vocal tract length. State of the art methods require a minimum of 10 to 20 seconds to successfully normalize the speech. This minimum requirement makes such systems impractical in certain situations, for example, voice-enabled dialog systems where only 2 to 4 seconds of speech may be available. Accordingly, what is needed in the art is faster, more robust method for calculating vocal tract length in order to normalize a speech sample.