This invention relates to quantization.
Quantization is the process of approximating a sequence of continuous-amplitude samples by a quantized sequence that can be represented by a finite-rate digital data sequence (e.g., a bit sequence) for purposes of digital transmission or storage. The primary performance criteria in quantization are the average distortion between the original signal and the quantized sequence, and the average number R of bits used to represent each sample (the bit rate).
Scalar quantization is the simplest form of quantization. In scalar quantization, successive samples are quantized independently. In a uniform scalar quantizer, the discrete quantization levels are uniformly spaced along the real line. For a source characterized by a uniform probability distribution, the uniform scalar quantizer is the optimum scalar quantizer. But for a source with a nonuniform distribution, a nonuniform scalar quantizer can produce lower distortion by allocating more quantization levels to regions where the source samples are more likely to fall. A nonuniform scalar quantizer can perform significantly better than a uniform scalar quantizer, particularly at high rates and for sources whose probability distributions exhibit long tails. Optimum nonuniform scalar quantizers can be determined using a well-known iterative algorithm, described by J. Max in "Quantizing for Minimum Distortion", IRE Trans. Inform. Theory, 1960.
Even the best nonuniform scalar quantizer usually falls short of the best possible performance, called the rate-distortion limit described by T. Berger in "Rate Distortion Theory", Prentice Hall, 1971. In order to close this gap, multiple source samples must be quantized jointly; this is called vector quantization.
An N-dimensional, rate-R vector quantizer uses a codebook with 2.sup.NR quantization levels or codewords in N-space. Optimum codebooks must satisfy so-called Voronoi and centroid conditions. These necessary conditions can be used in iterative design algorithms (e.g., the K-means algorithm) to generate codebooks from training sequences as described by Linde, Buzo and Gray in "An Algorithm for Vector Quantizer Design", IEEE Trans. Communications, 1980. Such codebooks have local optimality properties, but they usually lack structure, and therefore their computational and storage complexities both grow exponentially with the rate R and dimensionality N.
A lattice quantizer uses a codebook whose codewords are the vectors from an N-dimensional lattice that lie within some region of N-space, called the boundary region. A lattice quantizer uses a minimum-distance decoding algorithm that finds the lattice vector that is closest to the source vector. Efficient decoding algorithms are known for many good N-dimensional lattices.
However, according to asymptotic quantization theory (A. Gersho, "Asymptotically optimal block quantization", IEEE Trans. on Information Theory, 1972), to reach the minimum achievable distoration at fixed dimensionality and asymptotically increasing rate, it is essential to use codebooks whose codeword density is approximately proportional to the source probability distribution. However, for lattice quantizers, the distribution of code sequences is more or less uniform within the boundary region. (For example, a uniform scalar quantizer is a one-dimensional lattice quantizer which is strictly inferior compared to the optimum scalar quantizer.) In higher dimensions this suboptimality tends to disappear because of the effect of the law of large numbers; for example, for memoryless Gaussian sources, `sphere-hardening` causes the source distribution to be approximately uniform over a sphere, in which case a lattice quantizer bounded by a sphere becomes approximately optimum. But if complexity considerations necessitate the use of lower dimensionality, lattice quantization techniques need to be modified if performance near the asymptotic quantization theory bounds is to be achieved.
Known application of lattices to quantization has been limited to memoryless sources with uniform distributions. In that special case the optimum boundary region is an N-cube, and it is not necessary to store the codewords. For nonuniform sources, however, lattice quantizers have not been widely used, because if optimum codebook boundaries are to be employed (e.g., spheres in the case of Gaussian sources), it becomes necessary to store all the codewords.
A method due to Conway and Sloane "A fast encoding method for lattices and quantizers", IEEE, Trans. Inform. Theory, 1983, addresses this problem, at the expense of increased computational complexity, by using the Voronoi region of a large-scale integer-multiple of the lattice as the boundary region. The resulting codebook is called a Voronoi code. The code sequences need not be stored; rather, there is a mapping algorithm that involves a decoding of the large-scale lattice. The Conway and Sloane method is limited to Voronoi regions that are scaled integer multiples of the granular quantization cell. More generally, while Voronoi regions can be good choices as boundary regions for memoryless Gaussian sources, they may not be appropriate for non-Gaussian sources.
Other types of structured codebooks have also been proposed for vector quantization in order to reduce implementation complexity.
In so-called binary tree-searched vector quantization, (Buzo, et al., "Speech coding based upon vector quantization", IEEE Trans. ASSP, 1980), the codebook is constructed with a tree of NR stages; the i'th stage defines a pseudo-codebook with 2.sup.i codewords and the last (NRth) stage represents the true codebook. Starting at the initial node of the tree, the tree-structured quantizer operates on the source vector in a stage-by-stage fashion by selecting the pseudo-codewords that minimize the distortion; it refines its decisions as it moves along the tree, until the true codeword is specified in the final stage. In this method the quantizing complexity is proportional to the rate-dimension product NR rather than to 2.sup.NR. Also, the performance of this method is usually close to that of full-search unstructured vector quantization. Unfortunately, its storage complexity is even worse than that of full-search vector quantization, because of the need to store the pseudo-codebooks, and this limits its practical application.
There are alternative structures that can reduce the computational as well as the storage complexity, but at the expense of increased distortion. One of these structures is called a multi-stage vector quantizer (Juang and Gray, Jr., "Multiple-stage vector quantization for speech coding", Proc. of ICASSP, 1982). In the simplest case of two stages, a multi-stage vector quantizer consists of a coarse quantizer of rate R.sub.1 and a granular quantizer of rate R.sub.2, typically both if dimensionality N, such that R.sub.1 +R.sub.2 =R. The source vector is first quantized by the coarse quantizer, and the resulting residual (quantization error) is then quantized by the granular quantizer. The sum of the outputs of the coarse and granular quantizers is the codeword. In a two-stage quantizer, the computational and storage complexities are both proportional to 2.sup.NR 1+2.sup.NR 2, which can be substantially smaller than 2.sup.NR. However, performance is typically not very good, for two reasons: a.) depending on the quantization level chosen by the coarse quantizer, the residuals can exhibit different statistics, which cannot be effectively quantized by a fixed granular quantizer; b.) different quantization levels of the coarse quantizer have different probabilities, so a variable-rate granular quantizer is necessary to exploit these variations.
A way to alleviate the first factor is to transform the residuals by a linear transformation that depends on the quantization level chosen by the coarse quantizer, at the expense of increased storage and computational complexity (Roucos, Schwartz and Makhoul "Segment quantization for very low-rate speech coding", Proc. ICASSP, 1982.)
A seemingly different quantization method known as piecewise uniform quantization can also be viewed as a form of multi-stage vector quantization (Kuhlman and Bucklew, "Piecewise uniform vector quantizers", IEEE Trans. Inform. Theory, Sept.. 1988, and Swaszek, "Unrestricted multi-stage vector quantizers" Inform. Theory Symposium, 1990), that can alleviate the second problem. A piecewise uniform quantizer is a two-stage vector quantizer in which the granular quantizer is a lattic quantizer, with a rate that is varied based on the output of the coarse quantizer. The implementation complexity of the granular quantizer is determined by the decoding complexity of the underlying lattice, and its codewords need not be stored, provided that the coarse quantizer has quantization regions that can be characterized as N-cubes.
Piecewise uniform quantization is effective when the rate and/or the dimensionality is high, so that within any coarse quantization region the residuals are approximately uniformly distributed. Then these residuals can be effectively quantized with a uniform quantizer whose points are chosen from a lattice and whose codebook boundary is an N-cube. But since the granular quantizer has variable rate, implementing piecewise uniform quantizers requires novel mapping rules for addressing codewords in a way that produces a constant bit rate. Also, sometimes it may be desirable to implement piecewise uniform quantizers with coarse quantizers whose decision regions are not simple N-cubes.
Another method for reducing the computational and storage complexities of full-search vector quantizers is product quantization. In this method, the input vector is written as the product of two components of dimensionality N.sub.1 and N.sub.2, and these are quantized at rates R.sub.1 and R.sub.2, either independently or sequentially.
An interesting form of product quantization is spherical vector quantization, where the norm of the source vector is quantized with a first quantizer (typically, scalar) and the orientation of the source vector is quantized with a second quantizer whose codewords lie on the surface of an N-sphere (Adoul and Barth, "Nearest Neighbor Algorithm for Spherical Codes from the Leech Lattice", IEEE, Trans. Inform. Theory, 1988). The encoding complexity of the spherical quantizer is simplified by observing that its codewords can be generated as permutations of a small number of so-called class leaders. Quantizing can be performed by first permuting the source vector so that its components are in descending order, and then computing the distortions only to the class leaders. Spherical quantization is practical for low-rate encoding of Gaussian sources. A similar technique, known as pyramid quantization, has been developed for Laplacian sources, Fischer, "A pyramid vector quantizer", IEEE Trans. Inform. Theory, 1986.
In a method called trellis-coded quantization (TCQ), (Marcellin and Fischer, "Trellis-coded quantization of memoryless and Gauss-Markov sources", IEEE. Trans. on Commun., 1990), the codewords are sequences chosen from a trellis code. In the simplest one-dimensional case, the trellis code is constructed from a scalar nonuniform quantizer with an expanded number of levels partitioned into subsets. A convolutional code specifies the code sequences, and a Viterbi algorithm decoder is used to select the code sequence that best matches the input sequence. By increasing the complexity of the trellis code, TCQ can provide good performance for uniform or Gaussian sources. For sources whose probability distributions exhibit long tails, alternative methods are necessary to improve the performance. This latter limitation of TCQ is due to the fact that it scales its granular regions based on the one-dimensional source probability distribution rather than multi-dimensional joint probability distributions, as would be required by the asymptotic quantization theory.