Transform coding is a compression technique used in many audio, image and video compression systems. Uncompressed digital image and video is typically represented or captured as samples of picture elements or colors at locations in an image or video frame arranged in a two dimensional grid. For example, a typical format for images consists of a stream of 24-bit color picture element samples arranged as a grid. Each sample is a number representing color components at a pixel location in the grid within a color space, such as RGB, or YIQ, among others. Various image and video systems may use various different color, spatial and time resolutions of sampling.
Uncompressed digital image and video signals can consume considerable storage and transmission capacity. Transform coding reduces the size of digital images and video by transforming the spatial-domain representation of the signal into a frequency-domain (or other like transform domain) representation, and then reducing resolution of certain generally less perceptible frequency components of the transform-domain representation. This generally produces much less perceptible degradation of the digital signal compared to reducing color or spatial resolution of images or video in the spatial domain.
More specifically, a typical transform coding technique divides the uncompressed digital image's pixels into fixed-size two dimensional blocks, each block possibly overlapping with other blocks. A linear transform that does spatial-frequency analysis is applied to each block, which converts the spaced samples within the block to a set of frequency (or transform) coefficients generally representing the strength of the digital signal in corresponding frequency bands over the block interval. For compression, the transform coefficients may be selectively quantized (i.e., reduced in resolution, such as by dropping least significant bits of the coefficient values or otherwise mapping values in a higher resolution number set to a lower resolution), and also entropy or variable-length coded into a compressed data stream. At decoding, the transform coefficients will inversely transform to nearly reconstruct the original color/spatial sampled image/video signal.
Many image and video compression systems, such as MPEG and Windows Media, among others, utilize transforms based on the Discrete Cosine Transform (DCT). The DCT is known to have favorable energy compaction properties that result in near-optimal data compression. In these compression systems, the inverse DCT (IDCT) is employed in the reconstruction loops in both the encoder and the decoder of the compression system for reconstructing individual image blocks. An exemplary implementation of the IDCT is described in “IEEE Standard Specification for the Implementations of 8×8 Inverse Discrete Cosine Transform,” IEEE Std. 1180-1990, Dec. 6, 1990.
A drawback to the IDCT transform as defined in the IEEE Std. 1180-1990 is that calculation of the transform involves matrix multiplication of 64-bit floating point numbers, which is computationally expensive. This can limit performance of the image or video compression system, particularly in streaming media and like media playback applications, where the IDCT is performed on large amounts of compressed data on a real-time basis or under other like time constraints.
The Windows Media Video 9 codec (WMV9) standard, which has been proposed for standardization through the Society of Motion Picture and Television Engineers (SMPTE) C24 Technical Committee as Video Codec 9 (VC-9), defines four types of two-dimensional data transforms, which are an 8×8, 8×4, 4×8 and 4×4 transforms. These VC-9 standard transforms have energy compaction properties similar to the DCT, but have implementations based on matrix multiplication operations on integer numbers for computational efficiency. The matrix implementations of the WMV9/VC-9 transforms are described more fully in U.S. Pat. No. 7,242,713, issued Jul. 10, 2007 (the disclosure of which is incorporated herein by reference). The WMV9 specification calls for bit-exact implementations of the inverse transforms.
Fast implementations of linear transforms have a long history. One well-known example of fast transforms is the Fast Fourier Transform (FFT), described in J. W. Cooley and J. W. Tukey, “An Algorithm For The Machine Calculation Of Complex Fourier Series,” Math. Computation, vol. 19, pp. 297-301, 1965. The FFT realizes an N-point Fourier transform using O(N log N) operations. It is the inherent symmetry of the Fourier transform definition that allows for this simplification. Similar fast implementations have been shown to exist for the Discrete Cosine Transform (DCT), by W. Chen, C. H. Smith and S. C. Fralick, “A Fast Computational Algorithm For The Discrete Cosines Transform,” IEEE Trans. Commun., vol. 25, pp. 1004-1009, September 1977; and H. Malvar, “Fast Computation Of The Discrete Cosine Transform And The Discrete Hartley Transform,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp. 1484-1485, October 1987.
Fast transforms have decomposed the matrix multiplication definition of the transform into a series of steps involving the “butterfly” operation. The butterfly is a weighted data exchange between two variables, which are either spatial domain, frequency domain or intermediate variables. For example, the butterfly operation corresponding to the matrix multiplication,
  y  =            (                                    c                                s                                                              -              s                                            c                              )        ⁢    x  is shown in FIG. 3. This corresponds to a rotation of the original two dimensional vector x about the origin, with a possible scaling factor. The scaling factor is unity if c2+s2=1. A butterfly operation with real-valued inputs can be implemented with only three real-valued multiplies. In general, the matrix need not correspond to a pure rotation—scaling and shear are possible with no additional complexity.
The four-point WMV9/VC-9 transform permits a fast implementation via a straightforward application of the butterfly operation, as just described.
As discussed above, the 8-point DCT is known to have a fast transform implementation. However, it is not easily translated to the 8-point WMV9/VC-9 transform. The WMV9/VC-9 transform is similar to a DCT but the integer implementation and requirement of bit-exactness makes a direct mapping from any known fast implementation impossible.
As described in U.S. Pat. No. 7,242,713, issued Jul. 10, 2007, the 8-point WMV9/VC-9 transform can be implemented by operations using a pair of even and odd matrices. It is known that the even basis functions (i.e., basis functions 0, 2, 4 and 8) of the DCT can be trivially realized by a series of butterfly operations at the input followed by a four point DCT. This known fast implementation of the DCT translates well to the even matrix for the 8-point WMV9/VC-9 transform.
The known fast implementations, however, do not provide a way to derive a fast implementation of the odd matrix for the 8-point WMV9/VC-9 transform. While the WMV9/VC-9 transform is similar to a DCT, the integer implementation and requirement of bit-exactness in WMV9/VC-9 make a direct mapping from any known fast transform implementation impossible. The analysis and synthesis of the odd basis functions of these transforms cannot be solved with reference to these known fast transform implementations.