It is known that unconstrained vector quantization is the optimal quantization method for grouped samples, i.e. vectors, of a certain length. However, implementation of unconstrained vector quantization implies high requirements in terms of complexity and memory capacity. A desire to enable implementation of vector quantization also in situations with memory and search complexity constraints, have led to the development of so-called structured vector quantizers. Different structures gives different trade-offs in terms of search complexity and memory requirements. One such method is the so-called gain-shape vector quantization, where the target vector t is represented using a shape vector x and a gain value G:
                    x        =                  t          G                                    (                  Eq          ⁢                                          ⁢          0                )            
The concept of gain-shape vector quantization is to quantize the pair {x, G} instead of directly quantizing the target vector t. The gain(G) and shape(x) components are encoded using a shape quantizer which is tuned for the normalized shape input, and a gain quantizer which handles the dynamics of the signal. This gain-shape structure is frequently used in audio coding since the division into dynamics and shape, also denoted fine structure, fits well with the perceptual auditory model. The gain-shape concept can also be applied to Discrete Cosine Transform coefficients or other coefficients used in video coding.
Many speech and audio codecs such as ITU-T G.718 and IETF Opus (RFC 6716) use a gain-shape VQ based on a structured PVQ in order to encode the spectral coefficients of the target speech/audio signal.
The PVQ-coding concept was introduced by R. Fischer in the time span 1983-1986 and has evolved to practical use since then with the advent of more efficient Digital Signal Processors, DSPs. The PVQ encoding concept involves searching for, locating and then encoding a point on an N-dimensional hyper-pyramid with the integer L1-norm of K unit pulses. The so-called L1-norm is the sum of the absolute values of the vector, i.e. the absolute sum of the signed integer PVQ vector is restricted to be exactly K, where a unit pulse is represented by an integer value of “1”. A signed integer is capable of representing negative integers, in relation to unsigned which can only represent non-negative integers.
One of the interesting benefits with the structured PVQ-coding approach in contrast to many other structured VQs is that there is no inherent limit in regard of a dimension N, so the search methods developed for PVQ-coding should be applicable to any dimension N and to any K value.
One issue with the structured PVQ-shape quantization is to find the best possible quantized vector using a reasonable amount of complexity. For higher rate speech and audio coding, when the number of allowed unit pulses K, may become very high and the dimension N may also be high, there is even stronger demands on having an efficient PVQ-search, while maintaining the quality, e.g. in terms of Signal to Noise Ratio, SNR, of the reconstructed speech/audio.
Further, the use of the PVQ concept is not restricted to the speech and audio coding area. Currently, the so-called Internet Engineering Task Force, IETF, is pursuing a video codec development where Discrete Cosine Transform, DCT, coefficients are encoded using a PVQ-based algorithm. In video coding it is even more important than in audio coding to have an efficient search procedure, as the number of coefficients may become very large with large displays.