Audio coding systems are used to encode an audio signal into an encoded signal that is suitable for transmission or storage, and then subsequently receive or retrieve the encoded signal and decode it to obtain a version of the original audio signal for playback. Perceptual audio coding systems attempt to encode an audio signal into an encoded signal that has lower information capacity requirements than the original audio signal, and then subsequently decode the encoded signal to provide an output that is perceptually indistinguishable from the original audio signal. One example of a perceptual audio coding technique is described in Bosi et al., “ISO/IEC MPEG-2 Advanced Audio Coding.” J. AES, vol. 45, no. 10, October 1997, pp. 789–814, which is referred to as Advanced Audio Coding (AAC).
Perceptual coding techniques like AAC apply an analysis filterbank to an audio signal to obtain digital signal components that typically have a high level of accuracy on the order of 16–24 bits and are arranged in frequency subbands. The subband widths typically vary and are usually commensurate with widths of the so called critical bands of the human auditory system. The information capacity requirements of the signal are reduced by quantizing the subband-signal components to a much lower level of accuracy. In addition, the quantized components may also be encoded by an entropy coding process such as Huffman coding. Quantization injects noise into the quantized signals, but perceptual audio coding systems use psychoacoustic models in an attempt to control the amplitude of quantization noise so that it is masked or rendered inaudible by spectral components in the signal. An inexact replica of the subband signal components is obtained from the encoded signal by complementary entropy decoding and dequantization.
The goal in many conventional perceptual coding systems is to quantize the subband signal components and apply an entropy coding process to the quantized signal components in a manner that is optimum or as near optimum as is practical. Both quantization and entropy coding are usually designed to operate with as much mathematical efficiency as possible.
The design of an optimum or nearly optimum quantizer depends on statistical characteristics of the signal component values to be quantized. In a perceptual coding system that uses a transform to implement the analysis filterbank, the signal component values are derived from frequency-domain transform coefficients that are grouped into frequency subbands and then normalized or scaled relative to the largest magnitude component in each subband. One example of scaling is a process known as block companding. The number of the coefficients that are grouped into each subband typically increases with subband frequency so that the subband widths approximate the critical bandwidths of the human auditory system. Psychoacoustic models and bit allocation processes determine the amount of scaling for each subband signal. Grouping and scaling alter the statistical characteristics of the signal component values to be quantized; therefore, quantization efficiency is generally optimized for the characteristics of the grouped and scaled signal components.
In typical perceptual coding systems like the AAC system mentioned above, the wider subbands tend to have a few dominant subband-signal components with a relatively large magnitude and many more lesser signal components with significantly smaller magnitudes. A uniform quantizer does not quantize such a distribution of values with high efficiency. Quantizer efficiency can be improved by quantizing the smaller signal components with greater accuracy and by quantizing the larger signal components with less accuracy. This is often accomplished by using a compressing quantizer such as a μ-law or A-law quantizer. A compressing quantizer may be implemented by a compressor followed by a uniform quantizer, or it can be implemented by a non-uniform quantizer that is equivalent to the two-step process. An expanding dequantizer is used to reverse the effects of the compressing quantizer. An expanding dequantizer provides an expansion that is essentially the inverse of the compression provided in the compressing quantizer.
A compressing quantizer generally provides beneficial results in perceptual audio coding systems that represent all signal components with a level of quantization accuracy that is substantially equal to or greater than the accuracy specified by a psychoacoustic model as being necessary to mask quantization noise. Compression generally improves quantizing efficiency by redistributing the signal component values more uniformly within the input range of the quantizer.
Very low bit-rate (VLBR) audio coding systems generally cannot represent all signal components with sufficient quantization accuracy to mask the quantization noise. Some VLBR coding systems attempt to playback an output signal having a high level of perceived quality by transmitting or recording a baseband signal having only a portion of the input signal's bandwidth, and regenerating missing portions of the signal bandwidth during playback by copying spectral components from the baseband signal. This technique is sometimes referred to as “spectral translation” or “spectral regeneration”. The inventors have observed that compressing quantizers generally do not provide beneficial results when used in VLBR coding systems such as those that use spectral regeneration.
The design of an optimum or nearly optimum encoder such as those used in typical audio coding systems depends on statistical characteristics of the values to be encoded. In typical systems, groups of quantized signal components are encoded by a Huffman coding process that uses one or more code books to generate variable-length codes representing the quantized signal components. The shortest codes are used to represent those quantized values that are expected to occur most frequently. Each code is expressed by an integer number of bits.
Huffman coding often provides good results in audio coding systems that can represent all signal components with sufficient quantization accuracy to mask the quantization noise. The inventors have observed, however, that Huffman coding has serious limitations that make it unsuitable for use in many VLBR coding systems. These limitations are explained below.