Audio coding systems are used to encode an audio signal into an encoded signal that is suitable for transmission or storage, and then subsequently receive or retrieve the encoded signal and decode it to obtain a version of the original audio signal for playback. Perceptual audio coding systems attempt to encode an audio signal into an encoded signal that has lower information capacity requirements than the original audio signal, and then subsequently decode the encoded signal to provide an output that is perceptually indistinguishable from the original audio signal. One example of a perceptual audio coding system is described in the Advanced Television Systems Committee (ATSC) A/52A document entitled “Revision A to Digital Audio Compression (AC-3) Standard” published Aug. 20, 2001, which is referred to as Dolby Digital. Another example is described in Bosi et al., “ISO/IEC MPEG-2 Advanced Audio Coding.” J. AES, vol. 45, no. 10, October 1997, pp. 789-814, which is referred to as Advanced Audio Coding (AAC). In these two coding systems, as well as in many other perceptual coding systems, a split-band transmitter applies an analysis filterbank to an audio signal to obtain spectral components that are arranged in groups or frequency bands, and encodes the spectral components according to psychoacoustic principles to generate an encoded signal. The band widths typically vary and are usually commensurate with widths of the so called critical bands of the human auditory system. A complementary split-band receiver receives decodes the encoded signal to recover spectral components and applies a synthesis filterbank to the decoded spectral components to obtain a replica of the original audio signal.
Perceptual coding systems can be used to reduce the information capacity requirements of an audio signal while preserving a subjective or perceived measure of audio quality so that an encoded representation of the audio signal can be conveyed through a communication channel using less bandwidth or stored on a recording medium using less space. Information capacity requirements are reduced by quantizing the spectral components. Quantization injects noise into the quantized signal, but perceptual audio coding systems generally use psychoacoustic models in an attempt to control the amplitude of quantization noise so that it is masked or rendered inaudible by spectral components in the signal.
Traditional perceptual coding techniques work reasonably well in audio coding systems that are allowed to transmit or record encoded signals having medium to high bit rates, but these techniques by themselves do not provide very good audio quality when the encoded signals are constrained to low bit rates. Other techniques have been used in conjunction with perceptual coding techniques in an attempt to provide high quality signals at very low bit rates.
One technique called “High-Frequency Regeneration” (HFR) is described in U.S. patent application publication number 2003-0187,663 A1, entitled “Broadband Frequency Translation for High Frequency Regeneration” by Truman, et al., published Oct. 2, 2003, which is incorporated herein by reference in its entirety. In an audio coding system that uses HFR, a transmitter excludes high-frequency components from the encoded signal and a receiver regenerates or synthesizes noise-like substitute components for the missing high-frequency components. The resulting signal provided at the output of the receiver generally is not perceptually identical to the original signal provided at the input to the transmitter but sophisticated regeneration techniques can provide an output signal that is a fairly good approximation of the original input signal having a much higher perceived quality that would otherwise be possible at low bit rates. In this context, high quality usually means a wide bandwidth and a low level of perceived noise.
Another synthesis technique called “Spectral Hole Filling” (SHF) is described in U.S. patent application publication number 2003-0233234 A1 entitled “Improved Audio Coding System Using Spectral Hole Filling” by Truman, et al., published Dec. 18, 2003, which is incorporated herein by reference in its entirety. According to this technique, a transmitter quantizes and encodes spectral components of an input signal in such a manner that bands of spectral components are omitted from the encoded signal. The bands of missing spectral components are referred to as spectral holes. A receiver synthesizes spectral components to fill the spectral holes. The SHF technique generally does not provide an output signal that is perceptually identical to the original input signal but it can improve the perceived quality of the output signal in systems that are constrained to operate with low bit rate encoded signals.
Techniques like HFR and SHF can provide an advantage in many situations but they do not work well in all situations. One situation that is particularly troublesome arises when an audio signal having a rapidly changing amplitude is encoded by a system that uses block transforms to implement the analysis and synthesis filterbanks. In this situation, audible noise-like components can be smeared across a period of time that corresponds to a transform block.
One technique that can be used to reduce the audible effects of time-smeared noise is to decrease the block length of the analysis and synthesis transforms for intervals of the input signal that are highly non-stationary. This technique works well in audio coding systems that are allowed to transmit or record encoded signals having medium to high bit rates, but it does not work as well in lower bit rate systems because the use of shorter blocks reduces the coding gain achieved by the transform.
In another technique, a transmitter modifies the input signal so that rapid changes in amplitude are removed or reduced prior to application of the analysis transform. The receiver reverses the effects of the modifications after application of the synthesis transform. Unfortunately, this technique obscures the true spectral characteristics of the input signal, thereby distorting information needed for effective perceptual coding, and because the transmitter must use part of the transmitted signal to convey parameters that the receiver needs to reverse the effects of the modifications.
In a third technique known as temporal noise shaping, a transmitter applies a prediction filter to the spectral components obtained from the analysis filterbank, conveys prediction errors and the predictive filter coefficients in the transmitted signal, and the receiver applies an inverse prediction filter to the prediction errors to recover the spectral components. This technique is undesirable in low bit rate systems because of the signal overhead needed to convey the predictive filter coefficients.