The present invention relates generally to speech recognition systems and, in particular, to vector representation of speech parameters in signal processing for speech recognition.
As a user talks in a speech recognition system, his speech waveform is captured and analyzed. During what is commonly referred to as xe2x80x9cfront-endxe2x80x9d processing, acoustic features of the speech signal are extracted using a variety of signal processing techniques. These features provide a representation of the speech in a more compact format. Such features include (but are not limited to) filterbank channel outputs, linear predictive coding (LPC) coefficients, real cepstrum coefficients, and a variety of pitch and energy measures. These features can be transmitted or passed to a pattern recognition or matching system, commonly called the xe2x80x9cback-end,xe2x80x9d that compares the incoming acoustic features to speech templates and attempts to postulate what acoustic events (words, phones, etc.) have been spoken.
To save memory or communication channel bandwidth in the xe2x80x9cfront-end,xe2x80x9d the acoustic features may also undergo a quantization step. As will be understood by those skilled in the art, the features represent a time slice of the speech waveform. During vector quantization, a single table or multiple tables of representative feature vectors are searched for the closest match to the current feature vector. When the closest match is found according to a defined distortion measure, the index of the closest match in the table is employed to represent the feature. Certain designs that employ a combination of speech features perform this lookup individually on each speech feature. Various other designs combine the parameters for all the features into one large vector and perform the lookup only once.
Prior art methods have been proposed for quantizing front-end parameters in speech recognition. As mentioned above, a set of features such as the cepstrum or the LPC coefficients, are typically quantized as a set in a single vector. If multiple types of features are present each type of feature is vector quantized as a separate set. When a scalar parameter is used, such as frame energy, the value is quantized with a scalar quantizer. In addition, multiple scalar values are quantized with multiple scalar quantizers.
Such previous techniques have shortcomings. For example, in cases where coefficients are correlated, previous implementations are wasteful of memory needed to store the quantization tables, wasteful of computations to perform the table lookups, and wasteful of memory/bandwidth necessary for storage/transmission of the codebook indices. As another example, one element in a vector previously could dominate a distortion measure used during quantization, due to differences in magnitude or statistical variance.