Digital speech processing is extensively used in communication systems, telephony, digital answering machines, low rate videoconferencing, etc. Low rate speech coding is typically based on parametric modeling of the speech signal. The speech encoder computes representative parameters of the speech signal, quantizes them into products, and places them into the data stream, which may be sent over a digital communication link or saved in a digital storage media. A decoder uses those speech parameters to produce the synthesized speech.
Almost all known speech compression algorithms for bit rates less than or equal to 8000 are based on linear prediction. Typically, linear prediction coefficients (LPC) are transmitted as linear spectral frequencies (LSF) (sometimes they are called “linear spectral parameters (LSP)” or “linear spectral pairs (LSP)”). Depending on the bit rate provided by the speech compression algorithm, LSF are updated once per 10-30 ms. Usually a 10th order linear prediction filter is used, which means that the LSF are represented by a 10-dimensional vector.
FIG. 1 is a block diagram of a typical LSF encoder based on vector quantization. The current frame of a digitized speech signal enters the LSF calculator unit 110 where the current LSF vector is computed. Previous quantized LSF vectors are kept in the buffer memory 150. Typically only one last previous vector is stored in the buffer memory 150. The LSF predictor unit 160 computes some predetermined number of LSF vector predicted values. Some of these predicted values are typically independent of previous LSF vectors.
Then the current LSF vector and the set of predicted LSF vectors enters the vector quantizer unit 120. The vector quantizer unit 120 determines the best codebook index (or set of indices) and the best predictor number to provide the best approximation of the current LSF vector in the sense of some distortion measure. All indices computed by the vector quantizer enter indices encoder unit 130 where they are transformed into the codeword corresponding to the current LSF vector.
This codeword is sent along with other speech parameters into a data link transmission medium or a digital memory. Also, the codebook indices and predictor index enter the LSF reconstruction unit 140. Another input of the reconstruction unit is the set of predicted LSF vectors. In the LSF reconstruction unit 140 the quantized LSF vector is reconstructed. This vector is then saved in the buffer unit 150 to be used for prediction next LSF vectors.
Early quantizers used a single non-structured code and compared the source vector to each entry in the codebook (referred to as “full search quantizers”). The performance of vector quantization depends on the size of the codebook used, and to obtain better results, larger codebooks have to be used. On the other hand, storage and processing complexities also increase with increasing codebook size. To overcome this problem, suboptimal vector quantization procedures have been proposed that use multiple structured codebooks. One of the most widely used procedures is multistage vector quantization (MSVQ). In MSVQ a sequence of vector quantizers (VQ) is used. The input of the next VQ is the quantization error vector of the previous VQ.
An improvement on MSVQ is M-best or delayed decision MSVQ, which is described in (W. P. LeBlanc, B. Bhatacharya, S. A. Mahmood and V. Cuperman, “Efficient search and design procedures for robust multistage VQ of LPC Parameters for 4 kb/s speech coding” IEEE Transactions on speech and audio processing. Vol. 1, No. 4, Oct. 1993, pp. 373-385). The M-best MSVQ achieves better quantization results by keeping from stage to stage a few candidates (M candidates). The final decision for each stage is made only when the last quantization stage is performed. The more candidates that are kept, the higher the quantization gain that may be achieved and the greater the computational complexity.
The unit having the greatest impact on the performance of the quantizer is the vector quantization unit. Typically, an LSF vector is split into subvectors (usually 1 to 3 subvectors). A vector quantization procedure is then applied to each subvector. To improve the quantization accuracy, it is necessary to increase the dimensions of the subvectors and the corresponding codebook sizes. However, this leads to increasing the computational load needed for full search quantization. To decrease computational complexity, a multistage M-best quantization procedure is used.
The block diagram of a two-stage M-best quantizer is shown in FIG. 2. A source vector enters the first quantizer 210 having a smaller structured codebook C1 of size L1. For each entry x of the set of L1 codewords, the residual, or error vector is computed by subtracting x from the source vector. The output of this quantizer is a set of M1 codewords closest to the source vector in the sense of some distortion measure. The error vectors are processed by the second quantizer 220 with a smaller structured codebook C2 of size L2. The resulting candidate code vector(s) are then obtained as component wise sums of the first quantizer output and the corresponding approximated errors by adder 230. The final decision is made by the select best codeword unit 240 which selects from among the candidates the candidate closest to the source vector.
The common property of these suboptimal vector quantizers is that they reduce computational complexity by replacing an optimal large size non-structured codebook with a direct sum of small structured codebooks.