Transform coding is a compression technique used in many audio, image and video compression systems. Uncompressed digital images and video are typically represented or captured as samples of picture elements or colors at locations in an image or video frame arranged in a two-dimensional (2D) grid. This is conventionally referred to as a spatial-domain representation of the image or video. For example, a typical format for a rectangular-shaped image consists of three two-dimensional arrays of 8-bit color samples. Each sample is a number representing the value of a color component at a spatial location in a grid, where each color component represents an amplitude along an axis within a color space, such as RGB, or YUV, among others. An individual sample in one of these arrays may be referred to as a pixel. (In other common usage, the term pixel is used to refer to an n-tuple of n color component samples that are spatially co-located—for example, to refer to a 3-tuple grouping of the R, G, and B color component values for a given spatial location—however, the term is alternatively used here to refer to a scalar-valued sample). Various image and video systems may use different color, spatial and time resolutions of sampling. Similarly, digital audio is typically represented as time-sampled audio signal stream. For example, a typical audio format consists of a stream of 16-bit amplitude samples of an audio signal representing audio signal amplitudes at regularly-spaced time instants.
Uncompressed digital audio, image and video signals can consume considerable storage and transmission capacity. Transform coding can be used with other encoding techniques to reduce the quantity of data needed for representing such digital audio, images and video, for example, by transforming the spatial-domain (or time-domain) representation of the signal into a frequency-domain (or other like transform domain) representation, so as to enable a subsequent reduction in the quantity of data needed to represent the signal. The reduction in the quantity of data is typically accomplished by the application of a process known as quantization or by the selective discarding of certain frequency components of the transform-domain representation (or a combination of the two), followed by application of entropy encoding techniques such as adaptive Huffman encoding or adaptive arithmetic encoding. The quantization process may be applied selectively, based on the estimated degree of perceptual sensitivity of the individual frequency components or based on other criteria. For a given bit rate of output, appropriate application of transform coding generally produces much less perceptible degradation of the digital signal, as compared to reducing the color sample fidelity or spatial resolution of images or video directly in the spatial domain, or of audio in the time domain.
More specifically, a typical block transform-based coding technology divides the uncompressed pixels of the digital image into fixed-size two-dimensional blocks (X1, . . . Xn). A linear transform that performs spatial-frequency analysis is applied to a given block, which converts the spatial-domain samples within the block to a set of frequency (or transform) coefficients generally representing the strength of the digital signal in corresponding frequency bands over the block interval. For compression, the transform coefficients may be quantized (i.e., reduced in precision, such as by dropping least significant bits of the coefficient values or otherwise mapping values in a higher precision number set to a lower precision), and also entropy or variable-length coded into a compressed data stream. At decoding, the transform coefficients will be inverse-quantized and inverse-transformed back into the spatial domain to nearly reconstruct the original color/spatial sampled image/video signal (reconstructed blocks {circumflex over (X)}1, . . . {circumflex over (X)}n).
The ability to exploit the correlation of samples in a block and thus maximize compression capability is a major requirement in transform design. In many block transform-based coding applications, the transform should be reversible to support both lossy and lossless compression, depending on the quantization operation applied in the transform domain. With no quantization applied, for example, encoding that utilizes a reversible transform can enable the exact reproduction of the input data upon application of the corresponding decoding. However, the requirement of reversibility in these applications constrains the choice of transforms upon which the coding technology can be designed. The implementation complexity of a transform is another important design constraint. Thus, transform designs are often chosen so that the application of the forward and inverse transforms involves only multiplications by small integers and other simple mathematical operations such as additions, subtractions, and shift operations (to implement multiplication or division by a power of 2 such as 4, 8, 16, 32, etc.), so that fast integer implementations with minimal dynamic range expansion can be obtained.
Many image and video compression standards, such as JPEG (ITU-T T.81|ISO/IEC 10918-1) and MPEG-2 (ITU-T H.262|ISO/IEC 13818-2), among others, utilize transforms based on the Discrete Cosine Transform (DCT). The DCT is known to have favorable energy compaction properties but also has disadvantages in many implementations. The DCT is described by N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete Cosine Transform,” IEEE Transactions on Computers, C-23 (January 1974), pp. 90-93.
When compressing a still image (or an intra coded frame in a video sequence), many common standards such as JPEG and MPEG-2 partition the arrays representing the image into 8×8 blocks of samples and apply a block transform to each such image block. The transform coefficients in a given block in these designs are influenced only by the sample values within the block region. In image and video coding, quantization of samples in these independently-constructed blocks can result in discontinuities at block boundaries, and thus produce visually annoying artifacts known as blocking artifacts or blocking effects. Similarly for audio data, when non-overlapping blocks are independently transform coded, quantization errors will produce discontinuities in the signal at the block boundaries upon reconstruction of the audio signal at the decoder. For audio, a periodic clicking effect may be heard.
Techniques that are used to mitigate blocking artifacts include using deblocking filters to smooth the signal values across inter-block edge boundaries. These techniques are not without their flaws. For instance, deblocking techniques can require significant computational implementation resources.
Another approach is to reduce blocking effects by using a lapped transform as described in H. Malvar, “Signal Processing with Lapped Transforms,” Artech House, Norwood Mass., 1992. In general, a lapped transform is a transform having an input region that spans, besides the samples in the current block, some adjacent samples in neighboring blocks. Likewise, on the reconstruction side, the inverse lapped transform influences some decoded samples in neighboring blocks as well as samples of the current block. Thus, the inverse transform can preserve continuity across block boundaries even in the presence of quantization, consequently leading to a reduction of blocking effects. Another advantage of a lapped transform is that it can exploit cross-block correlation, which yields greater compression capability. In some lapped transform implementations, overlapping blocks of samples are processed in forward and inverse transforms. In other implementations, overlap processing is separated from transform processing; for encoding, overlap processing is performed across block boundaries prior to a forward transform that is performed on non-overlapping blocks, and for decoding, inverse transforms are performed for non-overlapping blocks, and then overlap processing is performed across block boundaries.
For the case of 2D data, in general, a lapped 2D transform is a function of the current block, together with select elements of blocks to the left, above, right, below the current block, and possibly blocks to the above-left, above-right, below-left and below-right of the current block. The number of samples in neighboring blocks that are used to compute the lapped transform for the current block is referred to as the amount of overlap or support.