Audio coding systems are used to encode an audio signal into an encoded signal that is suitable for transmission or storage, and then subsequently receive or retrieve the encoded signal and decode it to obtain a version of the original audio signal for playback. Perceptual audio coding systems attempt to encode an audio signal into an encoded signal that has lower information capacity requirements than the original audio signal, and then subsequently decode the encoded signal to provide an output that is perceptually indistinguishable from the original audio signal. One example of a perceptual audio coding system is described in the Advanced Television Standards Committee (ATSC) A52 document (1994), which is referred to as Dolby AC-3. Another example is described in Bosi et al., “ISO/IEC MPEG-2 Advanced Audio Coding.” J. AES, vol. 45, no. 10, October 1997, pp. 789-814, which is referred to as Advanced Audio Coding (AAC). These two coding systems, as well as many other perceptual coding systems, apply an analysis filterbank to an audio signal to obtain spectral components that are arranged in groups or frequency bands. The band widths typically vary and are usually commensurate with widths of the so called critical bands of the human auditory system.
Perceptual coding systems can be used to reduce the information capacity requirements of an audio signal while preserving a subjective or perceived measure of audio quality so that an encoded representation of the audio signal can be conveyed through a communication channel using less bandwidth or stored on a recording medium using less space. Information capacity requirements are reduced by quantizing the spectral components. Quantization injects noise into the quantized signal, but perceptual audio coding systems generally use psychoacoustic models in an attempt to control the amplitude of quantization noise so that it is masked or rendered inaudible by spectral components in the signal.
The spectral components within a given band are often quantized to the same quantizing resolution and a psychoacoustic model is used to determine the largest minimum quantizing resolution, or the smallest signal-to-noise ratio (SNR), that is possible without injecting an audible level of quantization noise. This technique works fairly well for narrow bands but does not work as well for wider bands when information capacity requirements constrain the coding system to use a relatively coarse quantizing resolution. The larger-valued spectral components in a wide band are usually quantized to a non-zero value having the desired resolution but smaller-valued spectral components in the band are quantized to zero if they have a magnitude that is less than the minimum quantizing level. The number of spectral components in a band that are quantized to zero generally increases as the band width increases, as the difference between the largest and smallest spectral component values within the band increases, and as the minimum quantizing level increases.
Unfortunately, the existence of many quantized-to-zero (QTZ) spectral components in an encoded signal can degrade the perceived quality of the audio signal even if the resulting quantization noise is kept low enough to be deemed inaudible or psychoacoustically masked by spectral components in the signal. This degradation has at least three causes. The first cause is the fact that the quantization noise may not be inaudible because the level of psychoacoustic masking is less than what is predicted by the psychoacoustic model used to determine the quantizing resolution. A second cause is the fact that the creation of many QTZ spectral components can audibly reduce the energy or power of the decoded audio signal as compared to the energy or power of the original audio signal. A third cause is relevant to coding processes that uses distortion-cancellation filterbanks such as the Quadrature Mirror Filter (QMF) or a particular modified Discrete Cosine Transform (DCT) and modified Inverse Discrete Cosine Transform (IDCT) known as Time-Domain Aliasing Cancellation (TDAC) transforms, which are described in Princen et al., “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation,” ICASSP 1987 Conf. Proc., May 1987, pp. 2161-64.
Coding systems that use distortion-cancellation filterbanks such as the QMF or the TDAC transforms use an analysis filterbank in the encoding process that introduces distortion or spurious components into the encoded signal, but use a synthesis filterbank in the decoding process that can, in theory at least, cancel the distortion. In practice, however, the ability of the synthesis filterbank to cancel the distortion can be impaired significantly if the values of one or more spectral components are changed significantly in the encoding process. For this reason, QTZ spectral components may degrade the perceived quality of a decoded audio signal even if the quantization noise is inaudible because changes in spectral component values may impair the ability of the synthesis filterbank to cancel distortion introduced by the analysis filterbank.
Techniques used in known coding systems have provided partial solutions to these problems. Dolby AC-3 and AAC transform coding systems, for example, have some ability to generate an output signal from an encoded signal that retains the signal level of the original audio signal by substituting noise for certain QTZ spectral components in the decoder. In both of these systems, the encoder provides in the encoded signal an indication of power for a frequency band and the decoder uses this indication of power to substitute an appropriate level of noise for the QTZ spectral components in the frequency band. A Dolby AC-3 encoder provides a coarse estimate of the short-term power spectrum that can be used to generate an appropriate level of noise. When all spectral components in a band are set to zero, the decoder fills the band with noise having approximately the same power as that indicated in the coarse estimate of the short-term power spectrum. The AAC coding system uses a technique called Perceptual Noise Substitution (PNS) that explicitly transmits the power for a given band. The decoder uses this information to add noise to match this power. Both systems add noise only in those bands that have no non-zero spectral components.
Unfortunately, these systems do not help preserve power levels in bands that contain a mixture of QTZ and non-zero spectral components. Table 1 shows a hypothetical band of spectral components for an original audio signal, a 3-bit quantized representation of each spectral component that is assembled into an encoded signal, and the corresponding spectral components obtained by a decoder from the encoded signal. The quantized band in the encoded signal has a combination of QTZ and non-zero spectral components.
TABLE 1Original SignalQuantizedDequantizedComponentsComponentsComponents101010101011010000000000100000000000000000001000000000000000000010000000000000011111000000000000001010100000000000000011110000000000001010101010010000001111000011111100000
The first column of the table shows a set of unsigned binary numbers representing spectral components in the original audio signal that are grouped into a single band. The second column shows a representation of the spectral components quantized to three bits. For this example, the portion of each spectral component below the 3-bit resolution has been removed by truncation. The quantized spectral components are transmitted to the decoder and subsequently dequantized by appending zero bits to restore the original spectral component length. The dequantized spectral components are shown in the third column. Because a majority of the spectral components have been quantized to zero, the band of dequantized spectral components contains less energy than the band of original spectral components and that energy is concentrated in a few non-zero spectral components. This reduction in energy can degrade the perceived quality of the decoded signal as explained above.