1. Field of the Invention
The present invention relates to the field of data compression.
2. Background Art
Compression is a scheme for reducing the amount of information required to represent data. Data compression schemes are used, for example, to reduce the size of a data file so that it can be stored in a smaller memory space. Data compression may also be used to compress data prior to its transmission from one site to another, reducing the amount of time required to transmit the data. To access the compressed data, it is first decompressed into its original form. A compressor/decompressor (codec) is typically used to perform the compression and decompression of data. One measure of the performance or efficiency of a codec is its xe2x80x9ccompression ratioxe2x80x9d. Compression ratio refers to the ratio of number of bits of uncompressed data to the number of bits of compressed data. Compression ratios may be 2:1, 3:1, 4:1 etc.
Data compression may also be required when the input/output rate of a particular data receiver is less than the data rate of the transmitted data. This can occur when providing video data to computer systems. Video data of frame size 320xc3x97240 is provided at rates approaching 7 megabytes per second. This rate is greater than the rate of commonly used I/O subsystems of personal computers. Some representative rates of common I/O subsystems found on personal computers (PC) are:
Another measure of video codec compression ratio is the average compressed bits-per-pixel. This measure is useful in describing video compression because different conventions are used for calculating the size of uncompressed video, i.e., some use 24 bits-per-pixel RGB and others use 4:2:2 subsampled YUV (16-bits per pixel). The averaging accounts for potentially different strategies employed for frames in a sequence. The bandwidth requirements for a sequence of frames is calculated by multiplying the average compressed bits-per-pixel and the number of frames per second, and dividing the resulting product by the number of pixels in each encoded frame.
Nearly all video compression techniques are lossy, i.e., information is inevitably discarded in the compression process. A measure of quality is how much this information is noticed by a human observer. However, there is not a consistent, objective model of human perception that can be applied. A simple, concrete, quality metric that is frequently used is the Mean-Squared-Error (MSE) that measures the error on a per-pixel basis from the uncompressed original.
Most compression algorithms are computationally complex, which limit their application since very complex algorithms often require expensive hardware to assist in the compression. A useful number to measure computational complexity of software-based compression algorithms is MIPS per megapixels/sec, i.e., essentially instructions/pixel. For example, an algorithm just capable of compressing 320xc3x97240 pixels per frame at 30 frames per second on a 40 MIPS machine has a computational complexity of 40,000,000/(320xc3x97240xc3x9730)≅17 instructions/pixel.
Symmetry refers to the ratio of the computational complexity of compression to that of decompression. Codec""s are frequently designed with a greater computational load on the compressor than the decompressor, i.e., they are asymmetric. While this may be a reasonable strategy for xe2x80x9ccreate-once, play-manyxe2x80x9d video sequences, it limits the range of applications for the codecs. Asymmetric compression techniques are not suitable for teleconferencing, for example, since teleconferencing requires essentially real-time processing and substantially equivalent compression and decompression rates.
Block Transform Coding Example (JPEG)
In the prior art, a class of image compressors called Block Transform Coding (BTC) is used. This is a fundamentally symmetric, image-compression technique that is used in (MPEG) and (JPEG) compression algorithms. In BTC, an image is divided into small blocks, the blocks are transformed using an invertible, two dimensional (2-D) mathematical transform, the transformed image is quantized, and the quantized result is losslessly compressed. This process forms the core of JPEG and MPEG compression, which use 8xc3x978 blocks and a Discrete Cosine Transform (DCT) to perform the 2-D transform.
FIG. 1 is a diagram illustrating computational blocks of a prior art system for performing JPEG still-image, compression. Input image 102 is provided to the color-space conversion and subsampling block 110. The output of the color-space conversion and subsampling block 110 is provided to block 112 for dividing each image plane into 8xc3x978 blocks. The output of block 114 is provided to the Discrete Cosine Transform block 114. Block 114 provides DC terms 116 to quantization block 120, which quantizes the DC terms 116 using differential pulse code modulation (DPCM). Block 114 provides AC terms 118 to block 122, which scalar quantizes the AC terms 118 by frequency. The outputs of blocks 120 and 122 are provided to the Huffman block 124, which compresses the quantized values using variable length codes to provide output 126.
Digital images 102 are typically stored in an RGB format, where each pixel is represented as a tuple of red (R), green (G), and blue (B) samples. While RGB format is suited towards most digital color input and output devices, it is not particularly efficient for the human visual system, or natural scenes. For example, in natural scenes the R, G, and B components of colors are highly correlated because most natural colors are very close to shades of gray, where R=G=B (i.e., saturated colors are rare). In other words, with respect to information coding, the correlation between RGB signals means that there is redundant information stored in the R, G, and B channels. To account for this redundant information, color-space conversion and subsampling block 110 transforms the colors of input image 102 into a color space with an explicit brightness, or luminance, dimension prior to compression. More bits are typically used to precisely specify the brightness while relatively fewer bits are used to specify the chrominance.
Broadcast television (TV) uses YUV color space to better utilize the bandwidth of TV""s. The YUV color space is essentially a rotation of the RGB basis vectors so that the luminance axis (Y) of YUV color space is aligned with the gray diagonal of RGB color space, which extends from RGB coordinates (0, 0, 0) to (1, 1, 1). The transformation for converting RGB color values to YUV space is expressed by Equation (1):                               [                                                    Y                                                                    U                                                                    V                                              ]                =                                            [                                                                    0.161                                                        0.315                                                        0.061                                                                                                              -                      0.079                                                                                                  -                      0.155                                                                            0.234                                                                                        0.330                                                                              -                      0.227                                                                                                  -                      0.053                                                                                  ]                        ⁡                          [                                                                    R                                                                                        G                                                                                        B                                                              ]                                .                                    (        1        )            
Reduction of redundant information can be achieved using the YUV color-space representation obtained using Equation (1). The human eye is much less sensitive to spatial detail in the U and V channels than it is in the Y channel because receptors in the eye for brightness (Y) are more numerous than those for chrominance (U, V). Using this fact, the U and V components can be sampled at a lower resolution. In JPEG compression, the U and V components are frequently subsampled by a factor of 2 in both x- and y-directions. For example, four Y samples and one sample each of U and V are produced for each 2xc3x972 block of an input image. For 8-bit samples per channel, this effectively produces a 2:1 compression factor. Thus, color-space conversion and subsampling block 110 converts an input image 102 from RGB color space to YUV color space using the transformation of Equation (1) and subsamples the input image 102 to reduce redundant information.
Once block 110 converts the input image 102 to YUV color space and subsamples the U and V planes, the prior art JPEG system of FIG. 1 treats the resulting three image planes (Y, U, and V) independently and codes them as three separate 1-channel images. Subsampling of U and V values reduces the amount of computation performed here as well.
For each of the resulting YUV image planes, block 112 of FIG. 1 segments the image output by color-space conversion and subsampling block 110 into fixed-size tiles, or blocks. In JPEG compression, the image is divided into blocks of 8xc3x978 pixels for a number of reasons. Many transforms have non-linear, computational complexity that is alleviated by small block sizes. For example, the computational complexity of a Discrete Cosine Transform (DCT), described below, is O(nlog(n)). Therefore, transforming small, fixed-sized blocks allows the overall compression algorithm to remain approximately linear in image size. The relatively small blocks localize compression artifacts in an image, i.e., the artifacts from a block that is particularly difficult to compress do not ripple throughout the image. Finally, small, fixed block sizes facilitate easier, hardwired optimization.
Once the image is segmented into 8xc3x978 blocks, a spatial transform is performed on each block. In the prior art JPEG system of FIG. 1, block 116 performs a Discrete Cosine Transform on each block of the three image planes provided by block 112. The DCT of block 114 is lossless resulting in 64 frequency values for each block. The first value produced by block 114 is a DC term 116 that is essentially the average YUV value of an 8xc3x978 block. The remaining values are AC terms 118 that represent edges in the x- and y-directions. The transform xe2x80x9csortsxe2x80x9d the block into detail components. Eight-by-eight blocks of an image plane that are relatively smooth have large values for the DC term 116 and lower frequency AC terms 118 and relatively little energy in the higher frequency AC terms 118. Blocks with strong vertical detail have considerable energy in the horizontal frequencies and comparatively little in the vertical.
Once block 114 produces DC term 116 and AC terms 118, DPCM quantization block 120 and scalar quantization block 122 quantize the resulting frequency terms 116 and 118, respectively. The DC term 116 is processed separately. It is not quantized directly, but rather its difference from the DC term of the previous block is quantized by block 120 using Differential Pulse Code Modulation coding, or DPCM. In Block Transform Coding, differential pulse code modulation of the DC term 116 takes advantage of block-to-block color correlations and maintains higher precision for the DC term 116. The low frequencies of AC terms 118 are quantized finely by block 122, since much of the image energy is contained there, and the higher frequencies of AC terms 118 are quantized more coarsely by block 122 using scalar quantization.
In JPEG, variable-length coding block 124 encodes the entropy of DC term 116 and AC terms 118 after quantization by blocks 120 and 122, respectively. The quantized DCT coefficients 116 and 118 are losslessly compressed using a variable-length, Huffman-like code. The quantized DC term 116 is coded individually with a code that is short for small differences and longer for large differences between block values. The sixty-three AC terms 118 are coded into a continuous bitstream, scanned in zig-zag order, with special run-length codes referring to runs of zero. The special treatment of zero-valued AC codes 118 is important because little of the image energy is located in the higher frequency terms of the DCT performed by block 114, and thus there is a high probability that many of the high frequency AC terms 118 are zero.
The prior art JPEG compression has several disadvantages. While the JPEG techniques provides high compression ratios for still-images, it is not suitable for many real-time software-based video applications. JPEG is not capable of providing 320xc3x97240xc3x9724 fps (or 1.8 Mps) using generally available PC""s due to the computational complexity. Because JPEG is a still-image standard, it cannot provide video rate compression with moderate compression using software. Instead, special hardware is required to provide JPEG compression at video rates that can support the above rate of 1.8 Mps. This is due to the computational complexity of performing a Discrete Cosine Transform on an 8xc3x978 block. MPEG compression provides video compression. While MPEG has same basic format as JPEG, it is an asymmetric compression method using special hardware that requires significantly greater compression time than decompression time, and is therefore unsuitable for providing real-time, symmetric video compression and decompression.
The present invention provides a method and apparatus for symmetrically compressing and decompressing video information in real time by coupling block and wavelet techniques. The present invention performs a wavelet transform on small blocks of an image and encodes the wavelet transformed blocks. The preferred embodiment of the present invention utilizes a block-oriented Haar wavelet transform on 2-by-2 pixel blocks and is useful in a wide variety of video coding applications.
In the compression pipeline, the image is divided into a plurality of blocks, where each block of pixels comprises 2kxc3x972k pixels. In the preferred embodiment of the present invention, k is equal to one. The average color of each block of the plurality of blocks is computed. The present invention computes an average luminance of each block dependent on the average color of each block and a differential luminance of each pixel of the plurality of pixels of each block. A first plurality of frequency details of each block are determined by Haar transforming the differential luminance of each pixel of the plurality of pixels of each block. The first plurality of frequency details comprises an average term, a horizontal term, a vertical term, and a diagonal term. The present invention computes an average color difference between each block and the block that immediately precedes it, and then quantizes the average color difference and the first plurality of frequency details. The average color difference and the first plurality of frequency details are quantized using Lloyd-Max quantization, which is dependent on a variance and a number of reconstruction levels. In an alternate embodiment of the present invention, skip codes are generated when the quantized average color difference and the second plurality of frequency details of the block match those of the corresponding block in a previous frame. The quantized average color difference and a second plurality of frequency details are encoded using variable length codes; the second plurality of frequency details is less than or equal to the first plurality of frequency details. The second plurality of frequency details comprises the horizontal term and the vertical term. In the preferred embodiment of the present invention, the quantized average color and the second plurality of frequency details are encoded using Huffman coding.
The present invention employs lookup tables to decompress video information and to format output pixels. The output of the compression pipeline containing variable length codes is first decoded into fixed-length codes. The fixed-length codes are then decoded into five device-independent components that represent a 2xc3x972 block using a first lookup table. The five components hCode, vCode, and a set of three compVals (RGB, described below) are provided as indices to a second lookup table containing precomputed values of R, G, and B components. The R, G, and B components of the second lookup table include precomputed display dependent formatting to produce the output image. In an alternate embodiment, skip codes contained in the output of the variable length decoder are decoded. Thus, the operations of reconstruction, inverse Haar transform, clamping, and dithering are reduced to a few table lookups. The per-pixel operation count is only 5-6 operations per pixel.