A classical prior-art technique for the coding of digital speech and audio signals is transform coding, whereby the signal to be encoded is divided in blocks of samples called frames, and where each frame is processed by a linear orthogonal transform, e.g. the discrete Fourier transform or the discrete cosine transform, to yield transform coefficients, which are then quantized.
FIG. 1 of the appended drawings shows a high-level framework for transform coding. In this framework, a transform T is applied in an encoder to an input frame giving transform coefficients. The transform coefficients are quantized with a quantizer Q to obtain an index or a set of indices for characterizing the quantized transform coefficients of the frame. The indices are in general encoded into binary codes which can be either stored in a binary form in a storage medium or transmitted over a communication channel. In a decoder, the binary codes received from the communication channel or retrieved from the storage medium are used to reconstruct the quantized transform coefficients with a decoder of the quantizer Q−1. The inverse transform T−1 is then applied to these quantized transform coefficients for reconstructing the synthesized frame.
In vector quantization (VQ), several samples or coefficients are blocked together in vectors, and each vector is approximated (quantized) with one entry of a codebook. The entry selected to quantize the input vector is typically the nearest neighbor in the codebook according to a distance criterion. Adding more entries in a codebook increases the bit rate and complexity but reduces the average distortion. The codebook entries are referred to as codevectors.
To adapt to the changing characteristics of a source, adaptive bit allocation is normally used. With adaptive bit allocation, different codebook sizes may be used to quantize a source vector. In transform coding, the number of bits allocated to a source vector typically depends on the energy of the vector relative to other vectors within the same frame, subject to a maximum number of available bits to quantize all the coefficients. FIGS. 2a and 2b detail the quantization blocks of the FIG. 1 in the general context of a multi-rate quantizer. This multi-rate quantizer uses several codebooks typically having different bit rates to quantize a source vector x. This source vector is typically obtained by applying a transform to the signal and taking all or a subset of the transform coefficients.
FIG. 2(a) depicts an encoder of the multi-rate quantizer, denoted by Q, that selects a codebook number n and a codevector index i to characterize a quantized representation y for the source vector x. The codebook number n specifies the codebook selected by the encoder while the index i identifies the selected codevector in this particular codebook. In general, an appropriate lossless coding technique can be applied to n and i in blocks En and Ei, respectively, to reduce the average bit rate of the coded codebook number nE and index iE prior to multiplexing (MUX) them for storage or transmission over a communication channel.
FIG. 2(b) shows decoding operations of the multi-rate quantizer. First, the binary codes nE and iE are demultiplexed (DEMUX) and their lossless codes are decoded in blocks Dn and Di, respectively. The retrieved codebook number n and index i are conducted to the decoder of the multi-rate quantizer, denoted by Q−1, that uses them to recover the quantized representation y of the source vector x. Different values of n usually result in different bit allocations, and equivalently different bit rates, for the index i. The codebook bit rate given in bits per dimension is defined as the ratio between the number of bits allocated to a source vector and the dimension of the source vector.
The codebook can be constructed using several approaches. A popular approach is to apply a training algorithm (e.g. the k-means algorithm) to optimize the codebook entries according to the source distribution. This approach yields an unstructured codebook, which typically has to be stored and searched exhaustively for each source vector to quantize. The limitations of this approach are thus its memory requirements and computational complexity, which increase exponentially with the codebook bit rate. These limitations are even amplified if a multi-rate quantization scheme is based on unstructured codebooks, because in general a specific codebook is used for each possible bit allocation.
An alternative is to use constrained or structured codebooks, which reduce the search complexity and in many cases the storage requirements.
Two instances of structured vector quantization will now be discussed in more detail: multi-stage and lattice vector quantization.
In multi-stage vector quantization, a source vector x is quantized with a first-stage codebook C1 into a codevector y1. To reduce the quantization error, the residual error e1=x−y1 of the first stage, which is the difference between the input vector x and the selected first-stage codevector y1, is then quantized with a second-stage codebook C2 into a codevector y2. This process may be iterated with subsequent stages up to the final stage, where the residual error en-1=x−yn-1 of the (n-1)th stage is quantized with an nth stage codebook Cn into a codevector yn.
When n stages are used (n≧2), the reconstruction can then be written as a sum of the codevectors y=y1+ . . . +yn, where yl is an entry of the lth stage codebook Cl for l=1, . . . , n. The overall bit rate is the sum of the bit rates of all n codebooks.
In lattice vector quantization, also termed lattice VQ or algebraic VQ for short, the codebook is formed by selecting a subset of lattice points in a given lattice.
A lattice is a linear structure in N dimensions where all points or vectors can be obtained by integer combinations of N basis vectors, that is, as a weighted sum of basis vectors with signed integer weights. FIG. 3 shows an example in two dimensions, where the basis vectors are v1 and v2. The lattice used in this example is well-known as the hexagonal lattice denoted by A2. All points marked with crosses in this figure can be obtained asy=k1v1+k2v2  (Eq. 1)where y is a lattice point, and k1 and k2 can be any integers. Note that FIG. 3 shows only a subset of the lattice, since the lattice itself extends to infinity. We can also write Eq. 1 in matrix form
                    y        =                              [                                          y                1                            ⁢                                                          ⁢                              y                2                                      ]                    =                                                    [                                                      k                    1                                    ⁢                                                                          ⁢                                      k                    2                                                  ]                            ⁡                              [                                                                                                    v                        1                                                                                                                                                v                        2                                                                                            ]                                      =                                          [                                                      k                    1                                    ⁢                                                                          ⁢                                      k                    2                                                  ]                            ⁡                              [                                                                                                    v                        11                                                                                                            v                        12                                                                                                                                                v                        21                                                                                                            v                        22                                                                                            ]                                                                        (                  Eq          .                                          ⁢          2                )            where the basis vectors v1=[v11 v12] and v2=[v21 v22] form the rows of the generator matrix. A lattice vector is then obtained by taking an integer combination of these row vectors.
When a lattice is chosen to construct the quantization codebook, a subset of points is selected to obtain a codebook with a given (finite) number of bits. This is usually done by employing a technique called shaping. Shaping is performed by truncating the lattice according to a shaping boundary. The shaping boundary is typically centered at the origin but this does not have to be the case, and may be for instance rectangular, spherical, or pyramidal. FIG. 3 shows an example with a spherical shaping boundary.
The advantage of using a lattice is the existence of fast codebook search algorithms which can significantly reduce the complexity compared to unstructured codebooks in determining the nearest neighbor of a source vector x among all lattice points inside the codebook. There is also virtually no need to store the lattice points since they can be obtained from the generator matrix. The fast search algorithms generally involve rounding off to the nearest integer the elements of x subject to certain constraints such that the sum of all the rounded elements is even or odd, or equal to some integer in modulo arithmetic. Once the vector is quantized, that is, once the nearest lattice point inside the codebook is determined, usually a more complex operation consists of indexing the selected lattice point.
A particular class of fast lattice codebook search and indexing algorithms involves the concept of leaders, which is described in detail in the following references:                C. Lamblin and J.-P. Adoul. Algorithme de quantification vectorielle sphérique à partir du réseau de Gosset d'ordre 8. Ann. Télécommun., vol. 43, no. 3–4, pp. 172–186, 1988 (Lamblin, 1988);        J.-M. Moureaux, P. Loyer, and M. Antonini. Low-complexity indexing method for Zn and Dn lattice quantizers. IEEE Trans. Communications, vol. 46, no. 12, December 1998 (Moureaux, 1998); and in        P. Rault and C. Guillemot. Indexing algorithms for Zn, An, Dn, and Dn++ lattice vector quantizers. IEEE Transactions on Multimedia, vol. 3, no. 4, pp. 395–404, December 2001 (Rault, 2001).        
A leader is a lattice point with components sorted, by convention, in descending order. An absolute leader is a leader with all non-negative components. A signed leader is a leader with signs on each component. Usually the lattice structure imposes constraints on the signs of a lattice point, and thus on the signs of a leader. The concept of leaders will be explained in more details hereinbelow.
A lattice often used in vector quantization is the Gosset lattice in dimension 8, denoted by RE8. Any 8-dimensional lattice point y in RE8 can be generated byy=[k1k2 . . . k8]GRE8  (Eq. 3)where k1, k2, . . . , k8 are signed integers and GRE8 is the generator matrix, defined as
                              G                      RE            g                          =                              [                                                            4                                                  0                                                  0                                                  0                                                  0                                                  0                                                  0                                                  0                                                                              2                                                  2                                                  0                                                  0                                                  0                                                  0                                                  0                                                  0                                                                              2                                                  0                                                  2                                                  0                                                  0                                                  0                                                  0                                                  0                                                                              2                                                  0                                                  0                                                  2                                                  0                                                  0                                                  0                                                  0                                                                              2                                                  0                                                  0                                                  0                                                  2                                                  0                                                  0                                                  0                                                                              2                                                  0                                                  0                                                  0                                                  0                                                  2                                                  0                                                  0                                                                              2                                                  0                                                  0                                                  0                                                  0                                                  0                                                  2                                                  0                                                                              1                                                  1                                                  1                                                  1                                                  1                                                  1                                                  1                                                  1                                                      ]                    =                      [                                                                                v                    1                                                                                                                    v                    2                                                                                                                    v                    3                                                                                                                    v                    4                                                                                                                    v                    5                                                                                                                    v                    6                                                                                                                    v                    7                                                                                                                    v                    8                                                                        ]                                              (                  Eq          .                                          ⁢          4                )            
The row vectors v1, v2, . . . , v8 are the basis vectors of the lattice. It can be readily checked that the inverse of the generator matrix GRE8 is
                              G                      RE            g                                -            1                          =                              1            4                    ⁡                      [                                                            1                                                  0                                                  0                                                  0                                                  0                                                  0                                                  0                                                  0                                                                                                  -                    1                                                                    2                                                  0                                                  0                                                  0                                                  0                                                  0                                                  0                                                                                                  -                    1                                                                    0                                                  2                                                  0                                                  0                                                  0                                                  0                                                  0                                                                                                  -                    1                                                                    0                                                  0                                                  2                                                  0                                                  0                                                  0                                                  0                                                                                                  -                    1                                                                    0                                                  0                                                  0                                                  2                                                  0                                                  0                                                  0                                                                                                  -                    1                                                                    0                                                  0                                                  0                                                  0                                                  2                                                  0                                                  0                                                                                                  -                    1                                                                    0                                                  0                                                  0                                                  0                                                  0                                                  2                                                  0                                                                              5                                                                      -                    2                                                                                        -                    2                                                                                        -                    2                                                                                        -                    2                                                                                        -                    2                                                                                        -                    2                                                                    4                                                      ]                                              (                  Eq          .                                          ⁢          5                )            
This inverse matrix is useful to retrieve the basis expansion of y:[k1k2 . . . k8]=yG−1RE8  (Eq. 6)
It is well-known that lattices consist of an infinite set of embedded spheres on which lie all lattice points. These spheres are often referred to as shells. Lattice points on a sphere in RE8 can be generated from one or several leaders by permutation of their signed components. All permutations of a leader's components are lattice points with the same norm, and thus they fall on the same lattice shell. Leaders are therefore useful to enumerate concisely the shells of a lattice. Indeed, lattice points located on shells close to the origin can be obtained from a very small number of leaders. Only absolute leaders and sign constraints are required to generate all lattice points on a shell.
To design a RE8 codebook, a finite subset of lattice points may be selected by exploiting the intrinsic geometry of the lattice, especially its shell structure. As described in (Lamblin, 1988), the lth shell of RE8 has a radius √{square root over (8l)} where l is a non-negative integer. High radius shells comprise more lattice points than lower radius shells. It is possible to enumerate all points on a given shell using absolute and signed leaders, noting that there is a fixed number of leaders on a shell and that all other lattice points on the shell are obtained by permutations of the signed leader components, with some restrictions on the signs.
In spherical lattice VQ, it is sufficient to reorder in decreasing order the components of x and then perform a nearest-neighbor search among the leaders defining the codebook to determine the nearest neighbor of a source vector x among all lattice points in the codebook. The index of the closest leader and the permutation index obtained indirectly from the ordering operation on x are then sent to the decoder, which can reconstruct the quantized analog of x from this information. Consequently, the concept of leaders allows a convenient indexing strategy, where a lattice point can be described by a cardinality offset referring to a signed leader and a permutation index referring to the relative index of a permutation of the signed leader.
Based on the shell structure of a lattice, and on the enumeration of the lattice in terms of absolute and signed leaders, it is possible to construct a codebook by retaining only the lower radius shells, and possibly completing the codebook with a few additional leaders of higher radius shells. We refer to this kind of lattice codebook generation as near-spherical lattice shaping. This approach is used in M. Xie and J.-P. Adoul, Embedded algebraic vector quantization (EAVQ) with application to wideband audio coding, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, Ga., U.S.A, vol. 1, pp. 240–243, 1996 (Xie, 1996).
For RE8, the absolute leaders in shells of radius 0 and √8 are shown below.
Absolute leader for the shell of radius 0                [0 0 0 0 0 0 0 0]        
Absolute leaders for the shell of radius √8                [2 2 0 0 0 0 0 0] and [1 1 1 1 1 1 1 1]        
A more complete listing for low-radius shells, for the specific case of RE8, can be found in Lamblin (1988).
For lattice quantization to be used in transform coding with adaptive bit allocation, it is desirable to construct multi-rate lattice codebooks. A possible solution consists of exploiting the enumeration of a lattice in terms of leaders In a similar way as in Xie (1996). As explained in Xie, a multi-rate leader-based lattice quantizer may be designed with for instance:                embedded algebraic codebooks, whereby lower-rate codebooks are subsets of higher-rate codebooks, or        nested algebraic codebooks, whereby the multi-rate codebooks do not overlap but are complementary in a similar fashion as a nest of Russian dolls.        
In the specific case of Xie, multi-rate lattice quantization uses six of codebooks named Q0, Q1, Q2, . . . , Q5, where the last five codebooks are embedded, i.e. Q1⊂Q2⊂ . . . ⊂Q5. These codebooks are essentially derived from an 8-dimensional lattice RE8. Following the notations of Xie, Qn refers to the nth RE8 codebook. The bit allocation of codebook Qn is 4n bits corresponding to 24n entries. The codebook bit rate being defined as the ratio between the number of bits allocated to a source vector and the dimension of the source vector, and in RE8 quantization, the dimension of the source vector being 8, the codebook bit rate of Qn is 4n/8=n/2 bits per dimension.
With the technique of Xie, the codebook bit rate cannot exceed 5/2 bits per dimension. Due to this limitation, a procedure must be applied to saturate outliers. An outlier is defined as a point x in space that has the nearest neighbor y in the lattice RE8 which is not in one of the multi-rate codebooks Qn. In Xie, such points are scaled down by a factor g>1 until x/g is no more an outlier. Apparently the use of g may result in large quantization errors. This problem is fixed in Xie (1996) by normalizing the source vector prior to multi-rate lattice quantization.
There are disadvantages and limitations in the multi-rate quantization technique of Xie, including:                1. Outlier saturation is usually a computation burden. Further, saturation may degrade significantly the quantization performance (hence quality) in the case of large outliers.        2. The technique handles outliers with saturation and does not allow to allocate more than 20 bits per 8-dimensional vector. This may be a disadvantage in transform coding, since high-energy vectors (which are more likely to be outliers) shall be normally quantized with a small distortion to maximize quality, implying it shall be possible to use a codebook with enough bits allocated to a specific vector.        3. The codebooks Q2, Q3, Q4 and Q5 of 8, 12, 16 and 20 bits are specified with 3, 8, 23 and 73 absolute leaders, respectively. Since storage requirements and search complexity are closely related to the number of absolute leaders, the complexity of these lattice codebooks explodes with increasing codebook bit rate.        4. The performance of embedded codebooks is slightly worse than that of non-overlapping (i.e. nested) codebooks.        
Another kind of lattice shaping, as opposed to near-spherical shaping, is Voronoi shaping, which is described in J. H. Conway and N. J. A. Sloane, A fast encoding method for lattice codes and quantizers, IEEE Trans. Inform. Theory, vol. IT-29, no. 6, pp. 820–824, November 1983 (Conway, 1983). It relies on the concept of Voronoi region described for instance in A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, 1992 (Gersho, 1992). In the specific case of a lattice codebook, a Voronoi region is the region of space where all points in N-dimensional space are closer to a given lattice point than any other point in the lattice. Each lattice point has an associated closed Voronoi region that includes also the border points equidistant to neighboring lattice points. In a given lattice, all Voronoi regions have the same shape, that is, they are congruent. This is not the case for an unstructured codebook.
A Voronoi codebook is a subset of a lattice such that all points of the codebook fall into a region of space with same shape as the Voronoi region of the lattice, appropriately scaled up and translated. To be more precise, a Voronoi codebook V(r) derived from the lattice Λ in dimension N is defined asV(r)=Λ∩(2rVΛ(0)+a)  (Eq. 7)where r is is a non-negative integer parameter defined later in more detail, VΛ(0) is the Voronoi region of Λ around the origin, and a an appropriate N-dimensional offset vector. Equation 7 is interpreted as follows: “the Voronoi codebook V(r) is defined as all points of the lattice Λ included in the region of N-dimensional space inside a scaled-up and translated Voronoi region VΛ(0), with the scaling factor m=2r and the offset vector a”. With such a definition, the codebook bit rate of V(r) is r bits per dimension. The role of a is to fix ties, that is, to prevent any lattice point to fall on the shaping region 2rVΛ(0)+a.
FIG. 4 illustrates Voronoi coding, Voronoi regions, and tiling of Voronoi regions in the two-dimensional hexagonal lattice A2. The point o refers to the origin. Both points o and z fall inside the same boundary marked with dashed lines. This boundary is actually a Voronoi region of A2 scaled by m=2 and slightly translated to the right to avoid lattice points on the region boundary. There are in total 4 lattice points marked with three dots (•) and a plus (+) sign within the boundary comprising o and z. More generally each such a region contains mN points. It can be seen in FIG. 4 that the same pattern, a Voronoi region of A2 scaled by m=2, is duplicated several times. This process is called tiling. For instance, the points o′ and z′ can be seen as equivalent to o and z, respectively, with respect to tiling. The point z′ may be written as z′=o′+z where o′ is a point of 2A2. The points of 2A2 are shown with plus signs in FIG. 4. More generally, the whole lattice can be generated by tiling all possible translations of a Voronoi codebook by points of the lattice scaled by m.
As described in D. Mukherjee and S. K. Mitra, Vector set-partitioning with successive refinement Voronoi lattice VQ for embedded wavelet image coding, Proc. ICIP, Part I, Chicago, Ill., October 1998, pp. 107–111 (Mukherjee, 1998), Voronoi coding can be used to extend lattice quantization by successive refinements. The multi-stage technique of Mukherjee produces multi-rate quantization with finer granular descriptions after each refinement. This technique, which could be used for multi-rate quantization in transform coding, has several limitations:                1. The quantization step is decreased after each successive refinement, and therefore it cannot deal efficiently with large outliers. Indeed, if a large outlier occurs in the first stage, the successive stages cannot reduce efficiently the resulting error, because they are designed to reduce granular noise only. The performance of the first stage is therefore critical.        2. The property of successive refinements implies constraints on the successive quantization steps. This limits the quantization performance.        