Today there is a high market need to transmit and store audio and video content at low bit rates, while maintaining high quality. Particularly, in cases where transmission resources or storage is limited, low bit rate operation is an essential cost factor. This is typically the case, for example, in streaming and messaging applications in mobile communication systems such as GSM, UMTS, or CDMA. On the other hand, most content, for instance on the Internet, is available only at high bitrates, which guarantees the highest quality but which cannot be streamed directly over mobile networks. In order for a content provider to distribute the content over a wide variety of networks, e.g. broadcast, the content has to be available in several formats at different bitrates or rate transcoded at some network gateway if and when the need arises.
A prior art technique solution to this problem is the use of scalable codecs. The basic idea with scalable codecs is that the encoding is done only once, resulting in a scalable bitstream including a basic layer and one or several enhancement layers. When truncating the bitstream, i.e. lowering the bitrate, by discarding at least one of the enhancement layers, the decoder is still able to decode the data at a lower rate. With this technology, rate transcoding becomes a simple truncation operation.
An interesting application for a scalable codec is audio-visual content distribution over heterogeneous networks, e.g. Mobile TV, Video Broadcast, Video-on-Demand, Concert streaming, etc. For such a service to be successful, it is very desirable that the content distribution should be as broad and as easy as possible. At the same time a certain minimum service quality should be guaranteed for the most adverse channel links, i.e. a minimum acceptable quality for links with poor bandwidth.
Scalable audio and video codecs are gaining more and more interest in standardization bodies like MPEG (Moving Picture Experts Group). In fact, MPEG is currently standardizing a scalable extension to the standard H264/AVC (Advanced Video Coding) as well as issuing a Call for Information on scalable audio and speech codecs. Other standardization bodies such as DVB (Digital Video Broadcasting) are also considering uses for SVC (scalable AVC).
Although scalable audio codecs already exist and have been standardized, e.g. BSAC (Bit Sliced Arithmetic Coding), which is used in association with AAC (Advanced Audio Coding), MPEG, as an expert group, still feels the need for new technology that can fill the existing gap at low bitrates. In fact, it is a well-known problem that scalable codecs always have a performance that is worse at a given bitrate than a non-scalable codecs at the same rate.
One prior art encoding of speech, and in general of audio signals, is based on transform coding. According to this method an original input signal is divided into successive overlapping blocks of samples (frames). A linear transform, such as the DFT (Discrete Fourier Transform) or the MDCT (Modified Discrete Cosine Transform), is applied on each frame, thus generating transform coefficients. These coefficients are quantized and yield quantized coefficients, which in turn are encoded and form part of the bitstream. The bitstream is stored or transmitted depending on the sought application. Upon reception of the bitstream, the decoder first decodes the previously encoded quantized coefficients and performs the inverse transform, such as IDFT or IMDCT, yielding decoded frames. The decoded frames are usually combined by the so-called overlap-add procedure in order to generate the decoded time-domain signal.
Vector Quantization (VQ) is a well-known quantization technique where several coefficients are grouped together into a vector. The resulting vector is approximated by an entry of a codebook. Depending on the distortion measure that is used, the nearest neighbor in the codebook is selected as the approximate to the input vector of coefficients. The larger the codebook, the better is the approximation, thus yielding lower overall distortion. However, this comes at the expense of increased storage, bitrate and computational complexity.
Codebooks for vector quantization may have different structures and can be designed in several ways.
One way to design a codebook for unstructured vector quantization is by using the well-known LBG (Linde-Buzo-Gray) algorithm (K-means). Unstructured codebooks are optimal in the sense that they are trained on the data and thus are tailored to the distribution of the vectors to be quantized. However, this optimality comes at the expense of an exhaustive search in order to find the nearest neighbor as well as huge storage requirements; both grow exponentially with the quantizer bitrate.
An alternative to unstructured vector quantization is the use of structured vector quantizers which are structurally constrained vector quantizers.
Multistage vector quantization is a form of tree structured quantization with much more reduced arithmetic and storage complexity. Instead of having a large codebook for a given rate, Multistage VQ starts by quantizing the vector with a reduced rate codebook. The residual of this first quantization stage is then fed to the second stage where another (or the same) codebook is used to quantize the residual, possibly at a different rate. This process is iterated for all stages yielding the final quantization error. The total rate of the quantizer is the sum of the rates of each quantizer stage.
In multistage vector quantization, a source vector x is quantized with a first-stage codebook CB1 yielding a codevector c1(i1) with index i1. The residual error of the first stage is computed as e1=x−c1(i1) and is quantized by the second stage using codebook CB2 yielding a codevector c2(i2) with index i2. This process is re-iterated with the following stages until the residual en−1=en−2−cn−1(in−1) is input to the last stage and quantized with codebook CBn yielding the codevector cn(in) with index in.
Reconstruction of the source vector consists of performing the inverse operation of the quantizer; upon reception of indices i1, i2, . . . , in the decoder computes a reconstructed vector given by:{circumflex over (x)}(i1, i2, . . . , in)=c1(i1)+c2(i2)+ . . . +cn(in)  (1)
The overall bitrate used to encode x is the sum of the bitrates of each stage. Besides the savings in computational complexity, multistage vector quantizers also provide a way to encode a vector in a successively refinable fashion.
In case only part of the indices are received, for examples, i1, i2, . . . , ik, k<n, then it is still possible to reconstruct a vector:{circumflex over (x)}(i1, i2, . . . , ik)=c1(i1)+c2(i2)+ . . . +ck(ik)  (2)which has a higher quantization error, i.e. lower performance, but which requires a lower bitrate. Thus, each additional received index improves the reconstructed vector.
Despite its advantages over normal unconstrained VQ, multistage VQ has several limitations:                Multistage vector quantization becomes quite complex when high rate quantization steps (i.e. large codebooks) are required.        Storage of the codebooks is proportional to the number of stages, thus limiting the flexibility of successive refinement.        The property of successive improvement implies constraints on the successive quantization steps, which limits the overall achievable performance at any rate.        
Another type of structured VQ is Lattice Vector Quantization (LVQ). In LVQ, codebooks are formed using a subset of points in a given lattice. A lattice is a geometrical object constructed as an integer linear combination of a set of basis vectors. The low complexity and memory consumption make the use of lattices for quantization very attractive. However, there are still several issues affecting their performance and complexity:                For variable-rate encoding, one must scale the lattice (base vectors) in order to obtain the desired distortion and rate, additionally, one has to encode the resulting indices with a lossless encoder.        For fixed rate encoding, shaping must be used in order to define a certain codebook and also scale the lattice such that most of the input vectors (called the support) lie within the defined shaping region. Vectors outside the shaping region, also called outliers, cause a very serious problem, which may be solved by saturation or by scaling. Both techniques add an additional computational burden and may degrade quality, especially in the case of large outliers.        
Each point c in a lattice of dimension d can be written as c=Gm, where G is called the generator matrix and m is a vector of integers. Several popular lattices exist, for example, the hexagonal lattice A2, the integer lattice Zn, and the Gosset lattice En.
When a lattice is chosen to design a quantizer of a certain rate, only a subset of lattice points are retained in order to form a codebook with a certain number of bits. A well-known technique is the so-called shaping of the lattice. This technique consists of truncating the lattice according to a shape boundary. The shape boundary is centered on some point (origin) and may take any shape, e.g. rectangular, spherical, or pyramidal, voronoi, etc.
Using lattices for quantization allows for very efficient nearest neighbor search algorithms. Such search algorithms may be found in [1] for the most useful lattices. On the other hand, when using lattices for quantization, there is virtually no need to store the codebook, since lattice points can be obtained directly from the generator matrix.
When a lattice point is found, a further task consists of indexing the lattice point. Several indexing algorithms have been devised. An interesting class of indexing algorithms employ the concept of leaders, which is described for instance in [2, 3]. This type of indexing is best used when using spherical shaping.
Another type of shaping is voronoi shaping, which is described in [4] and relies on the concept of voronoi regions.
Indexing and recovery of code vectors in a voronoi codebook can be done very efficiently using integer modulo operations, as described in [4].
The technique described in [5] uses voronoi coding in order to extend lattice quantization by successive refinements. This technique is quite similar to Multistage VQ with the conventional codebooks replaced by lattice codebooks. The essence of this technique is based on generating a series of decreasing scale voronoi lattice VQ's each covering the voronoi region of the base lattice at the previous higher scale. This technique, however, suffers from the problem of outliers, especially if an outlier occurs in the first stages. In fact, the successive stages are designed to reduce granular noise and therefore cannot efficiently deal with outliers. Another problem of this technique comes from the quantizer efficiency, since codebook entries of a subsequent stage do not efficiently cover the distribution of previous stages.
A technique described in [6] uses a multi-rate lattice quantization method for encoding lattice points. The technique relies on the concept of codebook extension. Whenever a quantized vector does not fall into a base codebook, the base codebook is itself extended in order to be able to index the quantized vector. This technique is in nature a variable rate technique.
Reference [7] describes a symmetric multi-description lattice vector quantizer. A labeling function is used in order to split the quantized vector into two redundant descriptions that are stored in two different streams. A similar technique is developed in [8] for asymmetric lattice vector quantization. These techniques have several drawbacks, such as:                Since the objective of multiple descriptions is to be able to decode each description separately, a certain amount of redundancy is carried in each description, this in turn would make the use of multiple descriptions highly inefficient in successively refinable quantizers.        The design of the optimal labeling function is a tedious task, which requires linear programming techniques.        The labeling function needs to store an index matching lookup table, thus if several matching functions are needed, then this would increase memory requirements.        