The development of digital data compression techniques for compressing visual information is very significant due to the high demand for numerous new visual applications. These new applications include, for example, television transmission including high definition television transmission, facsimile transmission, teleconferencing and video conferencing, digital broadcasting, digital storage and recording, multimedia PC, and videophones.
Generally, digital channel capacity is the most important parameter in a digital transmission system because it limits the amount of data to be transmitted in a given time.
In many applications, the transmission process requires a very effective source encoding technique to overcome this limitation. Moreover, the major issue in video source encoding is usually the tradeoff between encoder cost and the amount of compression that is required for a given channel capacity. The encoder cost usually relates directly to the computational complexity of the encoder. Another significant issue is whether the degradation of the reconstructed signal can be tolerated for a particular application.
As described in U.S. Pat. No. 5,444,489, a common objective of all source encoding techniques is to reduce the bit rate of some underlying source signal for more efficient transmission and/or storage. The source signals of interest are usually in digital form. Examples of these are digitized speech samples, image pixels, and a sequence of images. Source encoding techniques can be classified as either lossless or lossy. In lossless encoding techniques, the reconstructed signal is an exact replica of the original signal, whereas in lossy encoding techniques, some distortion is introduced into the reconstructed signal, which distortion can be tolerated in many applications.
Almost all the video source encoding techniques achieve compression by exploiting both the spatial and temporal redundancies (correlation) inherent in the visual source signals. Numerous source encoding techniques have been developed over the last few decades for encoding both speech waveforms and image sequences. Consider, for example, W. K. Pratt, Digital Image Processing, N.Y.: John Wiley & Sons, 1978; N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Englewood Cliffs, N.J.: Prentice-Hall, 1984; A. N. Netravali and G. B. Haskell, Digital pictures: Representation and compression, N.Y.: Plenum Press, 1988. Pulse code modulation (PCM), differential PCM (DPCM), delta modulation, predictive encoding, and various hybrid as well as adaptive versions of these techniques are very cost-effective encoding schemes at bit rates above one bit per sample, which is considered to be a medium-to-high quality data rate. However, a deficiency of all the foregoing techniques is that the encoding process is performed on only individual samples of the source signal. According to the well known Shannon rate-distortion theory described in T. Berger, Rate Distortion Theory, Englewood Cliffs, N.J.: Prentice Hall, 1971, a better objective performance can always be achieved in principle by encoding vectors rather than scalars.
Scalar quantization involves basically two operations. First, the range of possible input values is partitioned into a finite collection of subranges. Second, for each subrange, a representative value is selected to be output when an input value is within the subrange.
Vector quantization (VQ) allows the same two operations to take place in multi-dimensional vector space. Vector space is partitioned into subranges each having a corresponding representative value or code vector. Vector quantization was introduced in the late 1970s as a source encoding technique to encode source vectors instead of scalars. VQ is described in A. Gersho, "Asymptotically optimal block quantization," IEEE Trans. Information Theory, vol. 25, pp. 373-380, July 1979; Y. Linde, A. Buzo, and R. Gray, "An algorithm for vector quantization design," IEEE Trans. Commun., vol. 28, pp. 84-95, January, 1980; R. M. Gray, J. C. Kieffer, and Y. Linde, "Locally optimal quantizer design," Information and Control, vol. 45, pp. 178-198, 1980. An advantage of the VQ approach is that it can be combined with many hybrid and adaptive schemes to improve the overall encoding performance. Further, VQ-oriented encoding schemes are simple to implement and generally achieve higher compression than scalar quantization techniques. The receiver structure of VQ consists of a statistically generated codebook containing code vectors.
Most VQ-oriented encoding techniques, however, operate at a fixed rate/distortion tradeoff and thus provide very limited flexibility for practical implementation. Another practical limitation of VQ is that VQ performance depends on the particular image being encoded, especially at low-rate encoding. This quantization mismatch can degrade the performance substantially if the statistics of the image being encoded are not similar to those of the VQ.
Two other conventional block encoding techniques are transform encoding (e.g., discrete cosine transform (DCT) encoding) and subband encoding. In transform encoding, the image is decomposed into a set of nonoverlapping contiguous blocks and linear transformation is evaluated for each block. Transform encoding is described in the following publications: W. K. Pratt, Digital Image Processing, N.Y.: John Wiley & Sons, 1978; N. S. Jaynat and P. Noll, Digital Coding of Waveforms; Principles and Applications to Speech and Video, Englewood Cliffs, N.J.: Prentice-Hall, 1984; R. C. Gonzalez and P. Wintz, Digital Image Processing, Reading, Mass.; Addison-Wesley, 2nd ed., 1987. In transform encoding, transform coefficients are generated for each block, and these coefficients can be encoded by a number of conventional encoding techniques, including vector quantization. See N. M. Nasrabadi and R. A. King, "Image coding using vector quantization: a review," IEEE Trans. Commun., vol. 36, pp. 957-971, August 1986. The transform coefficients in general are much less correlated than the original image pixels. This feature offers the possibility of modeling their statistics with well defined distribution functions. Furthermore, the image is considered to be more compact in the transform domain because not all coefficients are required to reconstruct the image with very good quality. Transform encoding is also considered to be a robust technique when compared to VQ because the transformation is fixed for all classes of images.
Although meritorious to an extent, the effectiveness of transform encoding is questionable. The effectiveness depends critically on how the bits are allocated in order to encode the individual transform coefficients. This bit rate allocations problem is documented in A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Mass.: Kluwer Academic, 1992. This bit rate allocation problem often results in a highly complex computational strategy, especially if it is adaptive, as suggested in N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, Englewood Cliffs, N.J.: Prentice-Hall, 1984. The numerous computations associated with the transformation and the bit rate allocation strategy can lead to a high-cost hardware implementation. Furthermore, most encoders using transform encoding operate on block sizes of at least 8.times.8 pixels in order to achieve reasonable encoding performance. These block sizes are very effective in encoding the low detail regions of the image, but can result in poor quality in the high detail regions, especially at low bit-rates. In this regard, see R. Clarke, Transform Coding of Images, N.Y.: Academic, 1985. Thus, VQ is still known to be a better technique for encoding high detail image blocks.
Finally, in subband encoding the image is represented as a number of subband (band pass) images that have been subsampled at their Nyquist rate. In this regard, see M. Vetterli, "Multi-dimensional sub-band coding: some theory and algorithms," Signal Processing, vol. 6, pp. 97-112, April, 1984; J. W. Woods and S. D. O'Neil, "Subband coding of images," IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, pp. 1278-1288, October, 1986. These subband images are then separately encoded at different bit rates. This approach resembles the human visual system. Subband encoding is a very effective technique for high quality encoding of images and video sequences, such as high definition TV. Subband encoding is also effective for progressive transmission in which different bands are used to decode signals at different rate/distortion operating points.
However, a primary disadvantage of subband encoding is that the computational complexity of the bit rate allocations and the subband decomposition problem can lead to a high-cost hardware implementation. Furthermore, subband encoding is usually not very efficient in allocating bit rates to encode the subband images at low rates.
Hence, there is a heretofore unaddressed need in the art for a low bit rate source encoding system and method which are much simpler and inexpensive to implement and which exhibit better computational efficiency.