When audio or video signals are to be transmitted or stored, the signals are typically encoded. In an encoder, vectors representing audio/video signal samples are encoded to be represented by a number of coefficients or parameters. These coefficients or parameters can then efficiently be transmitted or stored. When the coefficients or parameters are received or retrieved, a decoding of the coefficients or parameters into audio/video signal samples is performed to retrieve the original audio/video signal. Many different kinds of encoding techniques have been used for audio/video signals.
One approach is based on vector quantization (VQ). It is known that unconstrained vector quantization (VQ) is the optimal quantization method for grouped samples (vectors) of a certain length. However, the memory and search complexity constraints have led to the development of structured vector quantizers. Different structures gives different trade-offs in terms of search complexity and memory requirements. One such method is the gain-shape vector quantization, where the target vector x is represented using a shape vector vec and a gain G:
  vec  =      x    G  
The concept is, instead of quantizing directly the target vector, to quantize pairs of {vec, G}. Gain and shape components are then encoded using a shape quantizer which is tuned for the normalized shape input and a gain quantizer which handles the dynamics of the signal. This structure is well used in e.g. audio coding since the division into dynamics and shape (or fine structure) fits well with the perceptual auditory model.
A valid entry in the selected structured vector quantizer, is first searched using the knowledge of the structure (e.g. L1 (absolute amplitude)-normalization or L2 (energy)-normalization). After a valid vector has been found one needs to efficiently create an index (or codeword) that represents that specific vector and then transmit that index to the receiver. The index creation (also known as indexing or enumeration) will use the properties of the selected structure and create a unique index (codeword) for the found vector in the structured VQ.
On the receiving side the decoder needs to efficiently decompose the index into the same vector that was determined on the encoder side. This decomposition can be made very low complex in terms of operations by using a large table lookup, but then at the cost of huge stored Read-Only Memory (ROM) tables. Alternatively one can design the decomposition (also known as de-indexing) so that it uses knowledge of the selected structure and potentially also the available target hardware numerical operations to algorithmically decompose the index into the unique vector, in an efficient way.
A well designed structured VQ, has a good balance between encoder search complexity, encoder indexing complexity and decoder de-indexing complexity in terms of Million Operations Per Second (MOPS) and in terms of Program ROM and dynamic Random Access Memory (RAM) required, and in terms of Table ROM.
Many audio codecs such as CELT, IETF/Opus-Audio and ITU-T G.719 use an envelope and shape VQ and an envelope mixed gain-shape VQ to encode the spectral coefficients of the target audio signal (in the Modified Discrete Cosine Transform (MDCT) domain). CELT/IETF OPUS-Audio use a PVQ-Vector Quantizer, while G.719 uses and slightly extended RE8 Algebraic Vector Quantizer for R=1 bit/dimension coding and a very low complexity D8 Lattice quantizer for VQ rates higher than 1 bit/dimension. PVQ stands for Pyramid Vector Quantizer, it is a VQ that uses the L1-norm(Σabs(vector)) to enable a fast search. It has also been found that PVQ may provide quite efficient indexing. The PVQ has been around for some time, but was initially developed in 1983-86 by Fischer.
PVQ-quantizers have also been used for encoding of time domain and Linear Prediction (LP) residual domain samples in speech coders, and for encoding of frequency domain Discrete Cosine Transform (DCT) coefficients. An advantage with the PVQ compared to other structured VQs is that it naturally can handle any vector dimension, while other structured VQs often are limited to the dimension being multiples, e.g. multiples of 4 or multiples of 8.
The IETF/OPUS Codec in Audio mode is employing a recursive PVQ-indexing and de-index scheme that has a maximum size of the PVQ-indices/(short) codewords set to 32 bits. If the target vector to be quantized requires more than 32 bits, the original target vector is recursively split in halves into lower dimensions, until all sub-vectors fit into the 32 bit short codeword indexing domain. In the course of the recursive binary dimension splitting there is an added cost of adding a codeword for encoding the energy relation (the relative energies, which can be represented by a quantized angle) between the two split sub target vectors. In OPUS-Audio the structured PVQ-search is made on the resulting split smaller dimension target sub-vectors.
The original CELT codec (developed by Valin, Terribery and Maxwell in 2009), is employing a similar PVQ-indexing/deindexing scheme, (with a 32 bit codeword limit) but the binary dimension split in CELT is made in the indexing domain after searching and after establishing the initial PVQ-structured vector. The integer PVQ-vector to index is then recursively reduced to smaller than or equal to 32 bit PVQ-vector sub-units in the integer domain. This is again achieved by adding an additional codeword for the split, this time for the integer relation between the ‘left’ integer sub-vector and the ‘right’ integer sub-vector, so that one can know the L1-norm of each of the sub PVQ-vectors in the decoder. The CELT post-search integer indexing split approach leads to a variable rate (variable total size index), which can be a disadvantage if the media-transport requires fixed rate encoding.
In 1997 and 1998 Hung, Tsern and Meng, investigated the error robustness of a few PVQ-indexing variations, they summarized the PVQ-enumeration (indexing) problem this way (1 is the vector dimension and k is the number of unit pulses):
“Enumeration assigns a unique index to all possible vectors in the PVQ-codebook, P(l, k), imparting a sorting order to the PVQ-codebook vectors.”
“Systematic sorting for enumeration is done through counting formulas for the number of vectors in the pyramid; this is a common concept to all pyramid enumeration techniques.”
“The number of vectors in the pyramid codebook P(l, k) is denoted N(l, k). This is related to a binary codeword index length which is ceil(log 2(N(l,k))) bits. N(l,k) can be viewed as the number of ways l integer values in a vector can have an absolute sum of k.”
Hung et al, studied the bit error robustness of the PVQ-codewords for a couple of variations of PVQ-indexing/enumeration schemes, but they did not focus the study on making the implementation of PVQ-enumeration efficient and of sufficiently low complex for an actual hardware implementation. The CELT and the IETF/OPUS-Audio PVQ-implementations of PVQ-indexing are strongly focused on providing an as low complex enumeration as possible (both encoding and decoding), given a 32 bit unsigned integer based hardware, but disregarding the PVQ-sensitivity to bit errors. Also in 1999 Ashley proposed a way to reduce the complexity for implementing the product code PVQ-enumeration by the use of a low complexity deterministic approximation of the Binomial combinatorial function used for size calculation and offset calculation, Ashley et al call this technique Factorial Pulse Coding (FPC), and it has been employed in the ITU-G.718 speech codec standard.
In CELT and IETF/OPUS-Audio, the PVQ-codeword is not limited by the granularity of a single bit. The two codecs use a higher granularity scheme, using eighth (⅛) bits resolution. This is achieved by using an arithmetic encoder/decoder as an intermediate step in the interface to the transport bit stream, (CELT/OPUS-Audio uses an implementation of a range encoder/decoder as the arithmetic encoder/decoder) where the number of bits used by the PVQ-codeword can be made into fractional bits. With a bit resolution BITRES=8 (eighths), the fractional PVQ codeword length becomes ceil(log 2(N(l, k)*BITRES))/BITRES. For instance, if l=64, k=5 and BITRES=8, this leads to that NPVQ=N(l,k)=286680704, log 2(NPVQ)=28.0948696, and ceil(log 2(NPVQ)*BITRES)/BITRES=28.125 bits. By using fractional bits there will be much less truncation loss for many of the N(l, k) PVQ codeword sizes, and especially when a codec is using many calls/instances of a PVQ-quantizer, this will increases the codec's efficiency.
One general issue with structured vector quantization is to find a suitable overall compromise including the methods for efficient search, efficient codeword indexing and efficient codeword de-indexing.
Long index codewords (e.g. a 400 bit integer codeword) gives larger complexity overhead in indexing and deindexing calculations (and special software routines will be needed for multiplying and dividing these large integers in the long codeword composition and decomposition).
Short index code words can use efficient hardware operators (e.g. Single Instruction Multiple Data (SIMD) instructions in a 32 bit Digital Signal Processor (DSP)), however at the cost of requiring pre-splitting of the target VQ-vectors (like in IETF/OPUS-Audio), or post-search-splitting the integer PVQ-search result-vector (like in original-CELT). These dimension splitting methods adds a transmission cost for the splitting information codeword (splitting overhead), and the shorter the possible index-codewords are, the higher number of splits are required and the result is an increased overhead for the long index codeword splitting. E.g. 16 bit short PVQ-codewords will result in many more splits than 32 bit short codewords, and thus a higher overhead for the splitting.
The PVQ (Pyramid Vector Quantizer) readily allows for a very efficient search, through L1-normalization. Typically the absolute sum normalized target vector is created, followed by vector value truncation (or rounding) and then a limited set of corrective iterations are run to reach the target L1-norm (k) for the PVQ-vector (PVQ-vec).
The problem of the previously mentioned CELT/OPUS prior art short codeword indexing schemes is that they are limited to a 32-bit integer range (unsigned 32-bit integers), and further they cannot be efficiently implemented in a DSP-architecture that only supports fast instructions for signed 32-bit integers.