Speech and audio coding algorithms have a wide variety of applications in communication, multimedia and storage systems. The development of such coding algorithms is driven by the need to save transmission and storage capacity while maintaining high quality of the coded signal. The complexity of the encoder is limited by the processing power of the application platform. In some applications, e.g. voice storage, the encoder may be highly complex, while the decoder should be as simple as possible.
In a typical speech encoder, an input speech signal is processed in segments, which are called frames. Usually the frame length is 10-30 ms, and a look ahead segment of 5-15 ms of the subsequent frame is also available. The frame may further be divided into a number of sub-frames. For every frame, the encoder determines a parametric representation of the input signal, for instance by Linear Predictive Coding (LPC). The obtained parameters are quantized and transmitted through a communication channel, or stored in a storage medium in a digital form. At the receiving end, the decoder constructs a synthesized signal based on the received parameters.
Therein, the LPC coefficients (or the corresponding Line Spectrum Frequencies (LSFs)) obtained by LPC are nowadays usually quantized with Vector Quantization (VQ) by stacking the LPC/LSF coefficients into a vector. Similarly, also the parameters related to the excitation signal (e.g. gain, pitch or voicing parameters) of several subsequent frames or sub-frames may be quantized by VQ.
VQ is a lossy data compression method based on the principle of block coding. In N-level VQ, vectors are quantized by selecting, from a codebook containing N reproduction vectors (the so-called codewords), the reproduction vectors that cause the smallest distortion with respect to the vectors (said distortion being determined by an appropriate distortion measure, as for instance the Euclidean distance or the squared Euclidean distance, to name but a few). These selected reproduction vectors can be uniquely identified by respective identifiers. If the quantized vectors are to be transmitted via a transmission channel, and if the codebook is known at a receiving site, it may then be sufficient to exchange only the identifiers between the quantizer at the transmitting site and a unit at the receiving site that is to retrieve the reproduction vector selected for a vector at the transmitting site. This unit then simply retrieves the reproduction vector identified by the identifier from the codebook. Frequently, N is chosen to be a power of 2, and then binary words of word length n=log2 (N) can be used as identifiers for the reproduction vectors. The word length n then is proportional to the output bit rate of the quantizer. With increasing word length n, the number of levels N=2n and thus the resolution of the quantizer increases, but also the output bit rate of the quantizer increases.
The quantization of the parameters requires codebooks, which contain reproduction vectors optimized for the quantization task. In the earlier days, the design of codebooks for VQ was considered to be a challenging task due to the need for multi-dimensional integration. In 1980, Linde, Buzo, and Gray (LBG) proposed the so-called LBG algorithm for generating codebooks based on a training sequence of vectors (see Linde, Y., Buzo, A. and Gray, R. M., “An algorithm for Vector Quantization”, IEEE Transactions on Communications, vol. 28, No. 1, Jan. 1980). The use of a training sequence of vectors bypasses the need for multi-dimensional integration. The LBG algorithm can be considered as a multi-dimensional generalization of the classic Lloyd algorithm that is suited for the construction of codebooks for scalar quantization.
The LBG algorithm produces a codebook for a desired number of levels N. If in the same codec, several numbers of levels N have to be supported, then for each number of levels N, a respective codebook has to be trained, and stored at both the quantizer and a unit that is used to retrieve the reproduction vectors from identifiers of the reproduction vectors. Such a need for several numbers of levels N may for instance arise in coding scenarios where transmission to terminals with different storage and processing capabilities is involved, or where the transmission channel characteristics are time-variant, or where the totally available bit rate is dynamically allocated between source and channel coding, to name but a few. The storage of respective codebooks for several different numbers of levels N disadvantageously increases the memory requirements of both the quantizer and a unit for retrieving reproduction vectors of vectors that have been quantized, thus increasing size and costs. Furthermore, the structure of the quantizer and the reproduction vector retrieving unit becomes cramped, because access to several codebooks has to be controlled.
Prior art document Haoui, A. and Messerschmitt, D. G.: “Embedded Coding of Speech: A Vector Quantization Approach”, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr. 1985, vol. 10, pp. 1703-1706, is related to embedded speech coders. An embedded speech coder is a source coder with the property that the fidelity of its reproduction of the input signal degrades gracefully as the bit rate is decreased in steps from a maximum rate to a minimum rate. Therein, the encoder does not know the actual bit rate being transmitted, and only knows the order in which bits will be discarded (for instance, the transmitted bit stream might be byte oriented, with the bits discarded in order from least to most significant). A codebook is designed for the maximum rate, and quantization is always performed for a fixed number of levels that corresponds to this maximum rate, irrespective of the number of bits that will be discarded during transmission (and thus affect the actual bit rate). To account for the discarding of bits, it is for instance proposed to assign two binary words that only differ in the least significant bit to two codewords which are close to each other in Euclidean distance so that replacing the least significant bit by zero will lead to a small increase in the error.
Prior art document Ragot, S., Bessette, B. and Lefebvre, R.: “Low-Complexity Multi-Rate Lattice Vector Quantization with Application to Wideband TCX Speech Coding at 32 kbit/s”, Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2004, vol. 1, pp. 501-504, is related to multi-rate lattice vector quantization. Therein, an 8-dimensional vector is encoded into one of 6 codebooks, which respectively form shells of an integer lattice RE8. Lattice VQ is self-scalable, i.e. the choice of the codebook, and thus the word length n, depends on the vector to be quantized and can thus not be selected prior to quantization. The codevectors are algebraically constructed and thus do not either have to be trained or stored. The flexibility of lattice VQ is generally restricted by the fact that each codebook is composed of several shells of the lattice whose cardinality depends on the lattice. By combining together several shells, the size of the codebooks can for instance be made a power of 2 or also other numbers, but not any desired number.