Video is a critical part of many applications such as entertainment, communications and training. In the past video data was represented exclusively in analog form which restricted the creation, storage and distribution of video to systems dedicated only to analog video. It is now possible to process video digitally which has greatly lowered the cost of using and creating video, and has allowed video to be created, stored and transmitted by general purpose digital computer equipment. This way of working with video is far less expensive and more convenient than use of analog means and has enabled many new ways of using and working with video.
Unfortunately, raw digital video represents a digital data rate of approximately 100 million bits per second of video. Storing and transmitting this raw data rate is difficult except with very expensive equipment and high performance digital communications links. A wide range of techniques have been developed to compress digital video in order to reduce this data rate to one that is far more manageable. Each compression technique has advantages for different applications.
One important compression technique is vector quantization ("VQ"). It is conventional to employ VQ to compress digital image data. See, for example, Robert M. Gray, "Vector Quantization," IEEE ASSP Vol. 1, pp 4-29 (April. 1984), U.S. Pat. No. 5,194,950, issued Mar. 16, 1993 to Murakami et al., and U.S. Pat. No. 5,067,152, issued Nov. 19, 1991 to Kisor et al.
Vector quantization of image data is based upon formatting the image pixels into image vectors that are then encoded by finding the best match to a set of code vectors in a "code book." The code vectors are in the same format as the image vectors, but do not necessarily exactly match any of the image vectors. The best match for each image vector is the code vector producing the least distortion when used to code the image vector. The image vector is coded by using the index of the code vector in the code book which produces the best match. Since the code book index comprises fewer bits than the image vector, compression is achieved. The quality of the decoded image depends entirely on the quality of the code book. The smaller the code book, the fewer code vectors it can contain and as a result, the average distortion caused by the coding process increases. The benefits of the smaller code book are higher compression ratio because of the fewer bits needed to represent the code book indices and faster compression performance because of the reduced searching. Larger code books allow better compression quality, but at the cost of reduced compression ratio.
The contents of the code book may either be fixed or may vary over time. Some conventional implementations use fixed code books to avoid transmitting the code book with the compressed data stream, thereby increasing the compression. The disadvantage of this technique is reduced quality since the code books have to be chosen based on typical image data and cannot be changed for images that differ significantly from the average. Fixed code books also make it easier to construct fixed structures for compression that do not require exhaustive searches of the code book such as tree structures. This decreases the time needed to compress an image.
Variable code books require new code books and new code entries to be computed during compression, increasing compression time and requiring transmission of the new code books and code book entries with the compressed data (decreasing the compression). The advantage of varying the code books is that the quality of the decoded images are is better because the code book dynamically adapts to the changing characteristics of the images being compressed.
It is well known to employ VQ techniques to encode and decode video data for video compression (and decompression) applications.
In performing VQ coding, the best match between an image vector and the code vectors of a code book is determined by computing the distance in N-space (where N is the number of components of the image vector) between the image vector and each code vector. The best match has the minimum distance between the two vectors. In preferred embodiments of the present invention (to be explained below), the distance is computed in the six-dimensional space of input image vectors and code book vectors having six components. In typical VQ implementations, the square of distance is computed since it is a mean square error (MSE) between the image vector and one of the code book vectors.
Consider an example in which input image vector V.sup.1 comprises six components of YUV color video data as follows: V.sup.1 =(Y.sub.0.sup.I, Y.sub.1.sup.I, Y.sub.2.sup.I, Y.sub.3.sup.I, U.sup.I, V.sup.I), where the first four components are Y (luminance) values, and the fifth and sixth components are U and V values, with each code book vector having form V.sup.C =(Y.sub.0.sup.C, Y.sub.1.sup.C, Y.sub.2.sup.C, Y.sub.3.sup.C, U.sup.C, V.sup.C). The MSE or square of the distance between these two vectors is given by: EQU MSE=[(Y.sub.0.sup.I -Y.sub.0.sup.C).sup.2 +(Y.sub.1.sup.I -Y.sub.1.sup.C).sup.2 +(Y.sub.2.sup.I -Y.sub.2.sup.C).sup.2 +(Y.sub.3.sup.I -Y.sub.3.sup.C).sup.2 +(U.sup.I -U.sup.C).sup.2 +(V.sup.I -V.sup.C).sup.2 ].
The most common form of VQ compression uses a single code book at the coder and the decoder. This works acceptably for modest compression ratios and medium quality decoded images. It does not work well when both a high compression ratio such as 1 bit per pixel and high quality decoded images are simultaneously required. When using a single code book, increasing the quality requires a larger code book, but this causes a decrease in the compression ratio.
FIG. 1 is a block diagram of a conventional system for implementing VQ compression (using a single code book) and decompression on video data. The input analog video signal is digitized by a standard video A/D circuit into a standard format (YUV 4:1:1) digital video stream. The Y component carries the luminance information of each pixel and the U and V components carry the chrominance information.
The YUV color space is a common color space for digital video, and is similar to the YIQ color space in which the VQ method of cited U.S. Pat. No. 5,067,152 is implemented. The "4:1:1" nomenclature means that there are four Y samples for each U and V sample. This format takes advantage of the eye's lesser sensitivity to color detail compared to luminance detail.
The digital video is scaled and filtered before compression in FIG. 1 system. Because video (e.g., digital NTSC video) can represent images with a resolution as large as 720.times.482 pixels, it is often desirable to scale down each image to a smaller size to match the compressed data rate to the capabilities of the storage system or digital transmission method.
Still with reference to FIG. 1, the VQ encoder formats the pixels into image vectors and codes them using a code book. The image vectors can correspond to rectangular blocks of pixels. Some conventional encoders compare each block to be coded with the corresponding block from the last frame, and send a different code word if the blocks match within an acceptable threshold. The code indicating a match to the prior block can be of shorter length than the VQ code words, thereby saving data and improving the compression ratio. This simple form of interframe coding is described in U.S. Pat. No. 5,008,747 to Carr et al.
The compressed data stream consists of VQ code words and in systems that dynamically vary the code book, new code book entries. The compressed data can be transmitted over a digital communications link, stored on a conventional hard disk in a computer, or used to master a CD-ROM.
With reference to FIG. 1, the decoder receives the code words and looks up the code vectors in a code book. These code vectors are the decompressed image vectors. If the system uses interframe coding as in U.S. Pat. No. 5,008,747, the code for a block match causes the block from the prior frame to be reused. The decoded image vectors are then converted to RGB pixels by converting the YUV values in the image vectors into RGB values using a suitable color conversion method. Finally, the RGB pixels are stored in an RGB graphics frame buffer for display on a CRT monitor.
U.S. Pat. No. 5,067,152 teaches separating the image to be compressed into separate planes corresponding to separate components of a YIQ pixel. YIQ is a natural color space for digital video information where Y is the luminance information and I and Q carry the chrominance. It is common practice to subsample the chrominance information to reduce its resolution and therefore the amount of data in the image. U.S. Pat. No. 5,067,152 subsamples the chrominance channels by 2X in each dimension thereby reducing the chrominance data by a factor of four, uses a separate code book for each of the Y, I, Q planes, and devotes more code book entries to the Y code book than to the I and Q code books. The subsampling of the chrominance information and the smaller code books for the I and Q planes takes advantage of the well known property that the human visual system is very sensitive to detail in brightness variation and relatively insensitive to detail in color information. In U.S. Pat. No. 5,067,152, the separation of image data into three types with separate code books is fixed (the Y image plane is always encoded with the Y code book, etc.) and there is no dynamic separation of each type of image data by small blocks based on local image statistics or properties.
Another conventional method and apparatus for performing VQ encoding on an image data stream is described in U.S. Pat. No. 5,068,723, issued Nov. 26, 1991. The method described in this patent employs a code book having two sections (a "common" section and a "specific" section). A best match is identified between an input vector and the entries of the "common" code book section. If the match is sufficiently close, the input vector is replaced by the address of the appropriate common code book entry. Otherwise, a second matching operation is performed to identify a best match between the input vector and the entries of the "specific" code book section, and the input vector is replaced by the address of the appropriate specific code book entry (and a flag indicating that the address is for the specific, rather than the common, code book section). The specific and common code book sections are adaptively updated by transferring code vectors from one section to the other, depending on the frequency of their selection as a "best match" to the input image vectors.
Another conventional method and apparatus for performing a VQ operation on a stream of image data is described in U.S. Pat. No. 5,194,950. This reference teaches application of several VQ methods to compress image data vectors of an interframe difference signal, where each of such image data vectors is the difference between two corresponding vectors of two successive frames of input image data. One such VQ method employs multiple code books as follows. First, a best match is identified between an input vector and the entries of a first code book. If the match is sufficiently close, the input vector is replaced by the address of the appropriate first code book entry. Otherwise, a second matching operation is performed to identify the best match between the input vector and the entries of a second code book, and the input vector is replaced by the address of the appropriate second code book. In another VQ method described in this reference, the input vectors are processed in a bank of band pass filters. Components of the input vectors in different frequency band components are separately processed, with different code books (i.e., different ones of code books 651 and code books 652 of FIG. 26) being used to compress different ones of the frequency band components. In several of the VQ methods described in this reference, a block discriminator unit is employed to compare successive blocks of input vector data. For each block determined to be sufficiently similar to a preceding block, the input vectors do not undergo usual VQ encoding (i.e., are not replaced by addresses or indices of code book entries), but are instead replaced by bits indicating that they are assigned the same values as corresponding vectors of the preceding block.