Lapped Transforms
The lapped transform is a powerful signal processing technique that is used in data compression. See, e.g., H. S. Malvar, Signal Processing with Lapped Transforms. Boston, Mass.: Artech House, 1992. However, to date, efficient lapped transforms with linear phase have neither been formulated nor been applied for lossless (reversible) compression of data.
As discussed in more detail below, it is known that a lapped transform can be formulated as a pre filter followed by a data transform (and its inverse as the inverse data transform followed by a post filter). See, e.g., H. S. Malvar, “A pre- and post-filtering technique for the reduction of blocking effects,” in Proc. Picture Coding Symposium, Stockholm, Sweden, June 1987; and T. D. Tran, J. Liang, and C. Tu, “Lapped Transform via Time-Domain Pre- and Post-Filtering”, IEEE Trans. on Signal Processing, vol. 51, no. 6, June 2003. A lossless data transform can be used in this formulation to achieve a good measure of reversibility. So far, it was believed that only a certain restricted variety of pre and post filters could be chosen for reversibility. This restricted set is very limited in its compression (rate vs. distortion, or R-D) performance. In a recent article (W. Dai and T. Tran, “Regularity-constrained pre- and post-filtering for block DCT--based systems,” IEEE Trans. on Signal Processing, vol. 51, pp. 2568-2581, October 2003), a construction in which most elements are reversible and which has good compression properties was presented.
In audio compression, several constructions for reversible lapped transforms were introduced. See, e.g., R. Geiger, J. Herre, J. Koller, and K. Brandenburg, “IntMDCT—A link between perceptual and lossless audio coding,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Orlando, Fla., May 2002; and J. Li, “Reversible FFT and MDCT viva matrix lifting.” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004. However, these constructions are applicable only to the modulated lapped transform (MLT), also known as modified discrete cosine transform (MDCT), whose basis functions are orthogonal and are not symmetric (that is, the basis functions are not linear phase). These transforms are not applicable to data compression applications where linear phase (symmetric) functions are required, such as in digital picture compression.
For picture (image) compression, one of the best-performing transforms in terms of R-D performance is the lapped biorthogonal transform (LBT). See, H. S. Malvar, “Biorthogonal and nonuniform lapped transforms for transform coding with reduced blocking and ringing artifacts,” IEEE Trans. on Signal Processing, vol. 46, pp. 1043-1053, April 1998. Unlike the MLT, the LBT basis functions are symmetric, and are not exactly orthogonal (in the LBT, the analysis basis functions are orthogonal to the synthesis basis functions, hence the term biorthogonal). LBTs have been successfully used in image compression applications, but they have not yet been used in lossless image compression, because integer-reversible constructions were not known.
Overview of Block Transform-Based Coding
Transform coding is a compression technique used in many audio, image and video compression systems. Uncompressed digital image and video is typically represented or captured as samples of picture elements or colors at locations in an image or video frame arranged in a two-dimensional (2D) grid. This is referred to as a spatial-domain representation of the image or video. For example, a typical format for images consists of a stream of 24-bit color picture element samples arranged as a grid. Each sample is a number representing color components at a pixel location in the grid within a color space, such as RGB, or YIQ, among others. Various image and video systems may use various different color, spatial and time resolutions of sampling. Similarly, digital audio is typically represented as time-sampled audio signal stream. For example, a typical audio format consists of a stream of 16-bit amplitude samples of an audio signal taken at regular time intervals.
Uncompressed digital audio, image and video signals can consume considerable storage and transmission capacity. Transform coding reduces the size of digital audio, images and video by transforming the spatial-domain representation of the signal into a frequency-domain (or other like transform domain) representation, and then reducing resolution of certain generally less perceptible frequency components of the transform-domain representation. This generally produces much less perceptible degradation of the digital signal compared to reducing color or spatial resolution of images or video in the spatial domain, or of audio in the time domain.
More specifically, a typical block transform-based codec 100 shown in FIG. 1 divides the uncompressed digital image's pixels into fixed-size two dimensional blocks (X1, . . . Xn), each block possibly overlapping with other blocks. A linear transform 120-121 that does spatial-frequency analysis is applied to each block, which converts the spaced samples within the block to a set of frequency (or transform) coefficients generally representing the strength of the digital signal in corresponding frequency bands over the block interval. For compression, the transform coefficients may be selectively quantized 130 (i.e., reduced in resolution, such as by dropping least significant bits of the coefficient values or otherwise mapping values in a higher resolution number set to a lower resolution), and also entropy or variable-length coded 130 into a compressed data stream. At decoding, the transform coefficients will inversely transform 170-171 to nearly reconstruct the original color/spatial sampled image/video signal (reconstructed blocks {circumflex over (X)}1, . . . {circumflex over (X)}n).
The block transform 120-121 can be defined as a mathematical operation on a vector x of size N. Most often, the operation is a linear multiplication, producing the transform domain output y=Mx, M being the transform matrix. When the input data is arbitrarily long, it is segmented into N sized vectors and a block transform is applied to each segment. For the purpose of data compression, reversible block transforms are chosen. In other words, the matrix M is invertible. In multiple dimensions (e.g., for image and video), block transforms are typically implemented as separable operations. The matrix multiplication is applied separably along each dimension of the data (i.e., both rows and columns).
For compression, the transform coefficients (components of vector y) may be selectively quantized (i.e., reduced in resolution, such as by dropping least significant bits of the coefficient values or otherwise mapping values in a higher resolution number set to a lower resolution), and also entropy or variable-length coded into a compressed data stream.
At decoding in the decoder 150, the inverse of these operations (dequantization/entropy decoding 160 and inverse block transform 170-171) are applied on the decoder 150 side, as show in FIG. 1. While reconstructing the data, the inverse matrix M−1 (inverse transform 170-171) is applied as a multiplier to the transform domain data. When applied to the transform domain data, the inverse transform nearly reconstructs the original time-domain or spatial-domain digital media.
In many block transform-based coding applications, the transform is desirably reversible to support both lossy and lossless compression depending on the quantization factor. With no quantization (generally represented as a quantization factor of 1) for example, a codec utilizing a reversible transform can exactly reproduce the input data at decoding. However, the requirement of reversibility in these applications constrains the choice of transforms upon which the codec can be designed.
Many image and video compression systems, such as MPEG and Windows Media, among others, utilize transforms based on the Discrete Cosine Transform (DCT). The DCT is known to have favorable energy compaction properties that result in near-optimal data compression. In these compression systems, the inverse DCT (IDCT) is employed in the reconstruction loops in both the encoder and the decoder of the compression system for reconstructing individual image blocks. The DCT is described by N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete Cosine Transform,” IEEE Transactions on Computers, C-23 (January 1974), pp. 90-93. An exemplary implementation of the IDCT is described in “IEEE Standard Specification for the Implementations of 8×8 Inverse Discrete Cosine Transform,” IEEE Std. 1180-1990, Dec. 6, 1990.
While compressing a still image (or an intra coded frame in a video sequence), most common standards such as MPEG-2, MPEG-4 and Windows Media partition the image into square tiles and apply a block transform to each image tile. The transform coefficients in a given partition (commonly known as block) are influenced only by the raw data components within the block. Irreversible or lossy operations on the encoder side such as quantization cause artifacts to appear in the decoded image. These artifacts are independent across blocks and produce a visually annoying effect known as the blocking effect. Likewise for audio data, when non-overlapping blocks are independently transform coded, quantization errors will produce discontinuities in the signal at the block boundaries upon reconstruction of the audio signal at the decoder. For audio, a periodic clicking effect is heard.
Several techniques are used to combat the blocking effect—the most popular among these are the deblocking filter that smoothes inter block edge boundaries, and spatial extrapolation that encodes differences between the raw input data and a prediction from neighboring block edges. These techniques are not without their flaws. For instance, the deblocking filter approach is “open loop”, i.e. the forward transform process does not take into account the fact that deblocking is going to be performed prior to reconstruction on the decoder side. Besides, both these techniques are computationally expensive.
In order to minimize the blocking effect, cross block correlations can be exploited. One way of achieving cross block correlation is by using a lapped transform as described in H. Malvar, “Signal Processing with Lapped Transforms,” Artech House, Norwood Mass., 1992. A lapped transform is a transform whose input spans, besides the data elements in the current block, a few adjacent elements in neighboring blocks. Likewise, on the reconstruction side the inverse transform influences all data points in the current block as well as a few data points in neighboring blocks.
For the case of 2-dimensional (2D) data, the lapped 2D transform is a function of the current block, together with select elements of blocks to the left, top, right, bottom and possibly top-left, top-right, bottom-left and bottom-right. The number of data points in neighboring blocks that are used to compute the current transform is referred to as the overlap.
Overview of the Spatial Domain Lapped Transform
The lapped transform can be implemented in the transform domain, as a step that merges transform domain quantities after a conventional block transform. Else, it can be implemented in the spatial-domain by a pre-processing stage that is applied to pixels within the range of overlap. These two implementations are mathematically related and therefore equivalent.
FIG. 2 shows an example of a conventional spatial-domain lapped transform. In the example shown, the overlap is 2 pixels, and two pixels each from the two adjacent blocks shown are pre-processed in pre-processing stage 210. Two pre-processed outputs are sent to each of the blocks for block transform-based coding by codec 100 as in FIG. 1. An inverse of the pre-processing stage is applied at post-processing stage 220 after decoding. With a judicious choice of pre-processing and block transform, a wide range of lapped transforms can be realized.
A key advantage of the spatial domain realization of the lapped transform is that an existing block transform-based codec can be retrofitted with a pre- and post-processing stage to derive the benefits of the lapped transform, i.e., reduced block effect and better compression, using an existing codec framework. Pre-processing 210 and post-processing can be represented as a matrix multiplication as shown in FIG. 3. Conventionally, the pre-processing and post-processing matrices are inverses of each other, i.e., pre-processing matrix (Pf) and the inverse or post-processing matrix (Pi) multiplied together equal the identity matrix I.