Many speech coders, such as the new 2.4 kb/s Federal Standard Mixed Excitation Linear Prediction (MELP) coder (McCree, et al., entitled, “A 2.4 kbits/s MELP Coder Candidate for the New U.S. Federal Standard,” Proc. ICASSP-96, pp. 200-203, May 1996.) use some form of Linear Predictive Coding (LPC) to represent. the spectrum of the speech signal. A MELP coder is described in Applicant's co-pending application Ser. No. 08/650,585, entitled “Mixed Excitation Linear Prediction with Fractional Pitch,” filed May 20, 1996, incorporated herein by reference. FIG. 1 illustrates such a MELP coder. The MELP coder is based on the traditional LPC vocoder with either a periodic impulse train or white noise exciting a 10th order on all-pole LPC filter. In the enhanced version, the synthesizer has the added capabilities of mixed pulse and noise excitation periodic or aperiodic pulses, adaptive spectral enhancement and pulse dispersion filter as shown in FIG. 1. Efficient quantization of the LPC coefficients is an important problem in these coders, since maintaining accuracy of the LPC has a significant effect on processed speech quality, but the bit rate of the LPC quantizer must be low in order to keep the overall bit rate of the speech coder small. The MELP coder for the new Federal Standard uses a 25-bit multi-stage vector quantizer (MSVQ) for line spectral frequencies (LSF). There is a 1 to 1 transformation between the LPC coefficients and LSF coefficients.
Quantization is the process of converting input values into discrete values in accordance with some fidelity criterion. A typical example of quantization is the conversion of a continuous amplitude signal into discrete amplitude values. The signal is first sampled, then quantized.
For quantization, a range of expected values of the input signal is divided into a series of subranges. Each subrange has an associated quantization level. For example, for quantization to 8-bit values, there would be 256 levels. A sample value of the input signal that is within a certain subrange is converted to the associated quantizing level. For example, for 8-bit quantization, a sample of the input signal would be converted to one of 256 levels, each level represented by an 8-bit value.
Vector quantization is a method of quantization, which is based on the linear and non-linear correlation between samples and the shape of the probability distribution. Essentially, vector quantization is a lookup process, where the lookup table is referred to as a “codebook”. The codebook lists each quantization level, and each level has an associated “code-vector”. The vector quantization process compares an input vector to the code-vectors and determines the best code-vector in terms of minimum distortion. Where x is the input vector, the comparison of distortion values may be expressed as:d(x, y(j))≦d(x, y(k))for all j not equal to k. The codebook is represented by y(j), where y(j) is the jth code-vector, 0≦j≦L, and L is the number of levels in the codebook.
Multi-stage vector quantization (MSVQ) is a type of vector quantization. This process obtains a central quantized vector (the output vector) by adding a number of quantized vectors. The output vector is sometimes referred to as a “reconstructed” vector. Each vector used in the reconstruction is from a different codebook, each codebook corresponding to a “stage” of the quantization process. Each codebook is designed especially for a stage of the search. An input vector is quantized with the first codebook, and the resulting error vector is quantized with the second codebook, etc. The set of vectors used in the reconstruction may be expressed as:
 y(j0j1,. . . jS-1)=y0(j0)+y1(j1)+yS-1(jS-1),
where S is the number of stages and ys is the codebook for the sth stage. For example, for a three-dimensional input vector, such as x=(2,3,4), the reconstruction vectors for a two-stage search might be y0=(1,2,3) and y1=(1,1,1) (a perfect quantization and not always the case).
During multi-stage vector quantization, the codebooks may be searched using a sub-optimal tree search algorithm, also known as an M-algorithm. At each stage, M-best number of “best” code-vectors are passed from one stage to the next. The “best” code-vectors are selected in terms of minimum distortion. The search continues until the final stage, when only one best code-vector is determined.
In predictive quantization a target vector for quantization in the current frame is the mean-removed input vector minus a predictive value. The predicted value is the previous quantized vector multiplied by a known prediction matrix. In switched prediction, there is more than one possible prediction matrix and the best prediction matrix is selected for each frame. See S. Wang, et al., “Product Code Vector Quantization of LPC Parameters,” in Speech and Audio Coding for Wireless and Network Applications,” Ch. 31, pp. 251-258, Kluwer Academic Publishers, 1993.
It is highly desirable to provide an improved distance measure that better correlates with subjective speech quality.