1. Field of the Invention
This invention relates to the field of video imaging systems. More specifically, this invention relates to an improved method and apparatus for video encoding/decoding.
2. Description of the Related Art
Due to the storage requirements, recent demands for full motion video in such applications as video mail, video telephony, video teleconferencing, image database browsing, multimedia, and other applications have required that standards be introduced for video compression. One image of 35 mm slide quality resolution requires 50 megabytes of data to be represented in a computer system (this number is arrived at by multiplying the horizontal by the vertical resolution by the number of bits to represent the full color range or 4096.times.4096.times.8.times.3[R+G+B]18=50,331,648 bytes). One frame of digitized NTSC (National Television Standards Committee) quality video comprising 720.times.480 pixels requires approximately one half megabyte of digital data to represent the image (720.times.480.times.1.5 bytes per pixel). In an NTSC system which operates at approximately 30 frames per second, digitized NTSC-quality video will therefore generate approximately 15.552 megabytes of data per second. Without compression, assuming a storage capability of one gigabyte with a two megabytes per second access rate, it is possible to:
a. store 65 seconds of live video on the disk and to play it back at 3 frames per second; PA0 b. store 21 high quality still images taking 24 seconds to store or retrieve one such image. PA0 1. Horizontal and Vertical Subsampling: Sampling only a limited number of pixels horizontally or vertically across an image. The required reduction in resolution provides for poor quality images. PA0 2. Reduction in Number of Bits Per Pixel: The technique including the use of a Color Look Up Table is currently used successfully to reduce from 24 to 8 bits per pixel. A reduction of approximately 3-1 is the useful limit of this method. PA0 3. Block Truncation Coding and Color Cell Methods: The block truncation coding (BTC) was developed by Bob Mitchell in the early 1980's targeted at low compression rate and high quality applications (Robert Mitchell, et al., Image Compression Using Block Truncation Coding, IEEE Trans., Comm., pp. 1335-1342, Vol. Com-27, No. 9, September 1979). In this scheme, the first order statistics (mean) and the second order statistics (variance) of each pixel block is extracted and transmitted. The image is reconstructed using these two quantities. An 8-1 compression ratio with 4.times.4 block sizes was demonstrated in (Graham Campbell, Two Bit/Pixel Full Color Encoding, pp. 215-223, Proceedings of SIGGRAPH '86, Vol. 20, No. 4, August 1986). PA0 4. Vector Quantization (VQ): A simple VQ maps discrete k-dimensional vectors into a digital sequence for transmission or storage. Each vector (a block of 4.times.4 or 3.times.3 pixels) is compared to a number of templates in the code book, and the index of the best matched template is transmitted to the receiver. The receiver uses the index for table look-up to reconstruct the image. A simple VQ could provide about 20-1 compression with good quality. A more complex VQ scheme has been demonstrated to provide similar quality to the CCITT (International Consultative Committee for Telephony & Telegraphy) DCT (Discrete Cosine Transformation) scheme recommendation H.261 (T. Murakami, Scene Adaptive Vector Quantization for Image Coding, Globecom, 1988). PA0 5. Predictive Techniques: The assumption on which this family of methods relies is that adjacent pixels are correlated. As a consequence, data reduction can be accomplished by predicting pixel values based on their neighbors. The difference between the predicted and the actual pixel value is then encoded. An extensive body of work exists on this technique and variations on it (O'Neil, J. B., Predictive Quantization Systems for Transmission of TV Signals, Bell System Technical Journal, pp. 689-721, May/June 1966). PA0 1. Digitizing the image; PA0 2. transform RGB to YUV; PA0 3. remove temporal redundancy (through frame differencing and motion compensation; PA0 4. remove spatial redundancy (through a discrete cosine transfer); and PA0 5. entropy encode the data (using Huffman coding).
Assuming that a fiber distributed data interface (FDDI) is available with a bandwidth of 200 megabits per second, 1.5 channels of live video can be accommodated, or 35 mm quality still images can be transmitted at the rate of one every two seconds. With currently available technology in CD-ROM, a likely distribution medium for products containing video, the current transfer rate is approximately 0.18 megabytes per second. 0.37 megabytes per second may be attained with CD-ROM in the near future.
For illustration, take the variable parameters to be the horizontal and vertical resolution and frame rate, and assume that 24 bits are used to represent each pixel. Let D represent the horizontal or vertical dimension and assume an aspect ratio of 4:3. The data rate in megabytes per second as a function of frame rate and image size is:
______________________________________ Image Size Frame Rate per second D 5 10 15 20 25 30 ______________________________________ 64 0.04 0.08 0.12 0.16 0.20 0.24 128 0.16 0.33 0.49 0.65 0.82 0.98 256 0.65 1.31 1.96 2.62 3.27 3.93 512 2.62 5.24 7.86 10.48 13.10 15.72 ______________________________________
or formulated in a slightly different way, the number of minutes of storage on a 600 megabyte disk is:
______________________________________ Image Size Frame Rate per second D 5 10 15 20 25 30 ______________________________________ 64 244.20 122.10 81.40 61.06 48.84 40.70 128 61.05 30.52 20.35 12.25 12.21 10.17 256 15.26 7.63 5.08 3.81 3.05 2.54 512 3.81 1.90 1.27 0.95 0.76 0.63 ______________________________________
It is obvious from data rate and storage considerations that data compaction is required in order for full motion video to be attained.
In light of these storage and rate problems, some form of video compression is required in order to reduce the amount of storage and increase the throughput required to display full-motion video in a quality closely approximating NTSC. Photographic and, to an even greater degree, moving images generally portray information which contains much repetition, smooth motion, and redundant information. Stated in an equivalent way, areas of an image are often correlated with each other, as are sequences of images over time. Keeping these facts in mind, several techniques as have been established which eliminate redundancy in video imaging in order to compress these images to a more manageable size which requires less storage, and may be displayed at a fairly high rate. Some simple compression techniques include:
The compression ratio to be expected from each of these simple methods is between four and eight to one.
More complex techniques for video compression are also known in the art. It is possible to achieve data compression of between four and eight to one by using some of the simpler techniques as mentioned above. To achieve comparable quality, at compression ratios from twenty to forty to one, involves a superlinear increase in complexity. In this case, it is no longer appropriate to consider the compression process as a simple one-step procedure.
In general, lossless compression techniques attempt to whiten or decorrelate a source signal. Intuitively, this makes sense in that a decorrelated signal cannot be compressed further or represented more compactly. For compression ratios of greater than twenty to one, a lossy element must be introduced somewhere into the process. This is usually done through a temporal or spatial resolution reduction used in conjunction with a quantization process. The quantization may be either vector or scalar. The quantizer should be positioned so that a graceful degradation of perceived quality with an increasing compression ratio results.
Many of the succeeding methods are complex, but may be broken into a series of simpler steps. The compression process can be viewed as a number of linear transformations followed by quantization. The quantization is in turn followed by a lossless encoding process. The transformations applied to the image are designed to reduce redundancy in a representational, spatial and temporal sense. Each transformation is described individually.