For still image compression, the Joint Photographic Experts Group (JPEG) standard has been established by ISO (International Standards Organization) and IEC (International Electro-Technical Commission). The performance of coders in accordance with this standard generally degrades at low bit-rates mainly because of the underlying block-based Discrete Cosine Transform (DCT) scheme.
A typical lossy image compression system (lossy signal/image encoder) is shown in FIG. 1 and comprises a source encoder 10, a quantizer 20, and an entropy encoder 30. Compression is accomplished by applying a linear transform to decorrelate the image data, quantizing the resulting transform coefficients, and entropy coding the quantized values.
For the source encoder 10, a variety of linear transforms have been developed which include the Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), and Discrete Wavelet Transform (DWT), for example.
The quantizer 20 reduces the number of bits needed to store the transformed coefficients by reducing the precision of those values. Because this is a many-to-one mapping, it is a lossy process and is the main source of compression in an encoder. Quantization can be performed on each individual coefficient, which is known as scalar quantization. Quantization can also be performed on a group of coefficients together, and is known as vector quantization. Both uniform and non-uniform quantizers can be used depending on the particular problem.
The entropy encoder 30 further compresses the quantized values losslessly to give better overall compression. It uses a model to accurately determine the probabilities for each quantized value and produces an appropriate code based on these probabilities so that the resultant output code stream will be smaller than the input stream. The most commonly used entropy encoders are the Huffman encoder and the arithmetic encoder, although for applications requiring fast execution, simple run-length encoding (RLE) has been used.
The idea of compressing an image is not new. The discovery of DCT in 1974 is an important achievement for the research community working on image compression. The DCT can be regarded as a discrete-time version of the Fourier-Cosine series. It is a close relative of DFT, a technique for converting a signal into elementary frequency components. Thus DCT can be computed with a Fast Fourier Transform (FFT) like algorithm in O(n log n) operations. Unlike DFT, DCT is real-valued and provides a better approximation of a signal with fewer coefficients.
FIGS. 2 and 3 show in more detail the components in a typical DCT-based encoder and decoder, respectively, for grayscale images. Color image compression can be approximately regarded as compression of multiple grayscale images, which are either compressed entirely one at a time, or are compressed by alternately interleaving 8×8 sample blocks from each in turn.
The DCT-based encoder 100 shown in FIG. 2 can be thought of as compressing a stream of 8×8 blocks of image samples 90. Each 8×8 block makes its way through each processing step/component, and yields output in compressed form into the data stream. The 8×8 blocks are provided to a forward DCT (FDCT) processor 105. Because adjacent image pixels are highly correlated, the FDCT processing lays the foundation for achieving data compression by concentrating most of the signal in the lower spatial frequencies. The sample is then passed to a quantizer 110 (similar to the quantizer 20 in FIG. 1), which uses a quantizer table 120. The results of the quantizer 110 are provided to an entropy encoder 115 (similar to the entropy encoder 30 in FIG. 1) which, in conjunction with a Huffman table 125, provides the output of compressed image data.
For a typical 8×8 sample block from a typical source image, most of the spatial frequencies have zero or near-zero amplitude and need not be encoded. In principle, the DCT introduces no loss to the source image samples; it merely transforms them to a domain in which they can be more efficiently encoded.
The decoder 200 of FIG. 3 performs the opposite functions of those of the encoder 100 of FIG. 2. The compressed image data is provided to an entropy decoder 205, which provides its output to a dequantizer 210 and then to an inverse DCT (IDCT) processor 215. A quantizer table 220 and a Huffman table 225 are also used in the reconstruction of the image 299.
Current computer systems are being designed with increasingly sophisticated graphics systems. These systems often have extremely powerful programmable graphics processing units (GPU) to perform sophisticated graphics functions. Currently, however, certain commonly used image/video coding and processing primitives are not well suited to implementation GPUs. One such function is the DCT and its inverse, which still run in the central processing unit (CPU). The DCT is a very expensive operation. Moreover, when real-time multimedia applications are implemented on a general purpose computer, the CPU is usually heavily loaded and in many cases the CPU alone cannot meet the real-time requirement. Oftentimes, the GPU is idle while the CPU is heavily loaded. It would be desirable to take advantage of the GPU's power in certain situations and applications.
The DCT and inverse DCT are operations that are used to separate an image into spectral sub-bands of differing importance with respect to the image's visual quality. It would be desirable to implement the DCT and inverse DCT on a GPU and make use of various GPU features such as parallel graphics pipelines, multi-channel capability, and multiple render targets to obtain significantly faster processing speeds than on a conventional CPU.
In view of the foregoing, there is a need for systems and methods that overcome the limitations and drawbacks of the prior art.