The present invention relates to digital signal processing, and more particularly to transform processing in audio/visual coding.
Processing of digital audio and video/image signals typically includes both transformation of the signals into frequency domains and reliance upon redundancies or psychoacoustic effects in order to reduce the number of bits used to encode the signals. Indeed, video coding standards such as MPEG-1, MPEG-2, MPEG-4, and H.263 use the hybrid video coding technique of block motion compensation plus transform coding. Block motion compensation is used to remove temporal redundancy between successive images (frames) by block predictions, whereas transform coding is used to remove spatial redundancy within each frame or within the prediction errors. FIGS. 2a-2b illustrate H.264/AVC video coding functions which include a deblocking filter within the motion compensation loop to limit artifacts created at block edges.
Thus a video frame can be encoded as its motion vectors plus corresponding prediction error blocks which are transformed, quantized, and entropy encoded. The transform of a block converts the pixel values of a block from the spatial domain into a frequency domain for quantization; this takes advantage of decorrelation and energy compaction of transforms such as the two-dimensional discrete cosine transform (DCT) or an integer transform approximating a DCT. For example, in MPEG and H.263, 8×8 blocks of DCT-coefficients are quantized, scanned into a one-dimensional sequence, and coded by using variable length coding (VLC). H.264 uses an integer approximation to a 4×4 DCT. Note that accurate nomenclature calls the DCT a type-II DCT, and the inverse DCT a type-III DCT.
Similarly, MPEG-1 audio coding standards such as Levels I, II, and III (MP3) apply an analysis filter bank to incoming digital audio samples to transform the signals into 32 frequency subbands and then within each of the subbands quantize the frequency-domain signal based upon psychoacoustic processing; see FIG. 2c. FIG. 2d shows the decoding including inverse quantization and a synthesis filter bank.
The MPEG-2 advanced audio coding (AAC) extends MP3 to larger sample windows and higher resolution in the frequency domain.
The MPEG-4 general audio coding (GA) standard enhances AAC with further capabilities such as spectral band replication (SBR) which extends low frequency bands to higher frequencies in a decoder and thereby reduces the number of bits required for encoding. FIGS. 3a-3b show functional blocks of an encoder and a decoder, respectively, which include PNS (perceptual noise substitution), TNS (temporal noise shaping), TwinVQ (transform-domain weighted interleaved vector quantization), M/S (conversion of spectra pairs from Mid/Side to Left/Right), and BSAC (bit sliced arithmetic coding).
The SBR block (FIG. 3c) has its own filter banks for analysis filtering into 32 frequency bands and, after high frequency band generation, inverse filtering 64 frequency bands back to the time domain. Modified discrete cosine transform (MDCT) and inverse MDCT (IMDCT) are used for the analysis and synthesis filter banks for low-power SBR. U.S. Pat. No. 6,680,972 describes SBR.
Pan, A Tutorial on MPEG/Audio, 2 IEEE Multimedia 60 (1995) describes the MPEG/audio Layers I, II, and III coding. Konstantinides, Fast Subband Filtering in MPEG Audio Coding, 1 IEEE Signal Processing Letters 26 (1994) and Chan et al, Fast Implementation of MPEG Audio Coder Using Recursive Formula with Fast Discrete Cosine Transforms, 4 IEEE Transactions on Speech and Audio Processing 144 (1996) both disclose reduced computational complexity implementations of the filter banks in MPEG audio coding.
However, the computational complexity of the transforms used are a problem for low power devices.