The present invention generally relates to methods and systems for encoding and decoding a multi-channel digital audio signal. More particularly, the present invention relates to low a bit rate digital audio coding system that significantly reduces the bit rate of multichannel audio signals for efficient transmission or storage while achieving transparent audio signal reproduction, i.e., the reproduced audio signal at the decoder side cannot be distinguished from the original signal even by expert listeners.
A multichannel digital audio coding system usually consists of the following components: a time-frequency analysis filter bank which generates a frequency representation, call subband samples or subband signals, of input PCM (Pulse Code Modulation) samples; a psychoacoustic model which calculates, based on perceptual properties of human ears, a masking threshold below which quantization noise is unlikely to be audible; a global bit allocator which allocates bit resources to each group of subband samples so that the resulting quantization noise power is below the masking threshold; a multiple of quantizers which quantize subband samples according the bits allocated; a multiple of entropy coders which reduces statistical redundancy in the quantization indexes; and finally a multiplexer which packs entropy codes of the quantization indexes and other side information into a whole bit stream.
For example, Dolby AC-3 maps input PCM samples into frequency domain using a high frequency resolution MDCT (modified discrete cosine transform) filter bank whose window size is switchable. Stationary signals are analyzed with a 512-point window while transient signals with a 256-point window. Subband signals from MDCT are represented as exponent/mantissa and are subsequently quantized. A forward-backward adaptive psychoacoustic model is deployed to optimize quantization and to reduce bits required to encode bit allocation information. Entropy coding is not used in order to reduce decoder complexity. Finally, quantization indexes and other side information are multiplexed into a whole AC-3 bit stream. The frequency resolution of the adaptive MDCT as configured in AC-3 is not well matched to the input signal characteristics, so its compression performance is very limited. The absence of entropy coding is another factor that limits its compression performance.
MPEG 1 &2 Layer III (MP3) uses a 32-band polyphase filter bank with each subband filter followed by an adaptive MDCT that switches between 6 and 18 points. A sophisticated psychoacoustic model is used to guide its bit allocation and scalar nonuniform quantization. Huffman code is used to code the quantization indexes and much of other side information. The poor frequency isolation of the hybrid filter bank significantly limits its compression performance and its algorithm complexity is high.
DTS Coherent Acoustics deploys a 32-band polyphase filter bank to obtain a low resolution frequency representation of the input signal. In order to make up for this poor frequency resolution, ADPCM (Adaptive Differential Pulse Code Modulation) is optionally deployed in each subband. Uniform scalar quantization is applied to either the subband samples directly or to the prediction residue if ADPCM produces a favorable coding gain. Vector quantization may be optionally applied to high frequency subbands. Huffman code may be optionally applied to scalar quantization indexes and other side information. Since the polyphase filter bank+ADPCM structure simply cannot provide good time and frequency resolution, its compression performance is low.
MPEG 2 AAC and MPEG 4 AAC deploy an adaptive MDCT filter bank whose window size can switch between 256 and 2048. Masking threshold generated by a psychoacoustic model is used to guide its scalar nonuniform quantization and bit allocation. Huffman code is used to encode the quantization indexes and much of other side information. Many other tool boxes, such as TNS (temporal noise shaping), gain control (hybrid filter bank similar to MP3), spectral prediction (linear prediction within a subband), are employed to further enhance its compression performance at the expense of significantly increased algorithm complexity.
Accordingly, there is a continuing need for a low bit rate audio coding system which significantly reduces the bit rate of multi-channel audio signals for efficient transmission or storage, while achieving transparent audio signal reproduction. The present invention fulfills this need and provides other related advantages.