The background of the present invention is described herein in the context of pay television systems, such as cable television systems, that distribute program material to subscribers, but the invention is by no means limited thereto except as expressly set forth in the accompanying claims.
Cable television operators receive much of their program material from remote earth stations via a plurality of geosynchronous orbit satellites. Typically, the cable operator selects the program material to be made available to its subscribers by making arrangements with the satellite distributors of that program material. Each cable operator then distributes the selected program material to its subscribers, via a coaxial cable distribution system, from its "cable head-end" where the material is received from the satellite. Frequently, cable operators also provide their own local programming at the site of the head-end, and further include network broadcasts as well. In DBS (direct broadcast satellite) applications, each subscriber is capable of receiving a satellite down-link directly.
Typically, in both types of systems (cable and DBS), the program material (comprising both video and audio) is transmitted as analog signals. Conventional transmission techniques place substantial limitations on the maximum number of viewer channels that can be transmitted over any given transponder on a satellite since each channel requires a minimum bandwidth to avoid noticeable degradation and the total number of channels that can be transmitted over a given satellite transponder is limited by the bandwidth of each signal, and of the transponders. Similarly, the electrical properties of coaxial cable limit its bandwidth and therefore place substantial limitations on the number of channels that can be delivered to cable television subscribers using conventional transmission techniques.
There is an interest in the pay television industry (including both cable television and DBS) to increase the number of channels that can be delivered to subscribers. However, to achieve this goal using conventional techniques would require more satellites and/or more transponders. There is also an interest in distributing HDTV (high definition television) signals to subscribers, but again, to achieve this goal using conventional techniques would require that some other programming be eliminated, or that additional satellites be placed in orbit or that more transponders be employed, since transmission of HDTV signals requires very high bandwidth. However, due to the limited number of locations in the geosynchronous orbit belt, placing more satellites in orbit is impractical, not to mention expense. Additionally, there is a finite number of transponders that can be placed on each satellite, and transponder space is at a premium, and rental is expensive. Insofar as cable transmission is concerned, conventional techniques allow expansion of the number of channels that can be transmitted, but only by expensive upgrading or rebuilding of the cable system.
Digital image transmission techniques have been investigated for overcoming this problem. Digital image transmission offers the advantage that digital data can be processed at both the transmission and reception ends to improve picture quality. However, the process of converting the program material from analog form to digital form results in data expansion. Thus, if the digitized program material were to be transmitted in raw digital form, the number of channels that could be transmitted over the satellite, or over the cable, would decrease, rather than increase.
Digital data compression techniques may be employed to maximize the amount of digital information that can be transmitted. Lossless compression techniques, such as Huffman encoding and LZW (Lempel, Ziv and Welch) encoding, offer, at best, compression ratios of 2.5 to 1 and do not sufficiently compensate for the amount of data expansion that occurs in converting data from analog to digital form.
A number of so-called "lossy" compression techniques have been investigated for digital image compression. DCT (discrete cosine transform) is one known method. Another method, which, until recently, has been used principally for speech compression, is vector quantization. Vector quantization has shown promise for offering high compression ratios, and high fidelity image reproduction. It has been demonstrated that, using vector quantization (hereinafter sometimes referred to as "VQ"), compression rates as high as 25:1, and even as high as 50:1, can be realized without significant visually perceptible degradation in image reproduction.
Compression of video images by vector quantization involves dividing the pixels of each image frame into smaller blocks of pixels, or sub-images, and defining a "vector" from relevant data (such as intensity and/or color) reported by each pixel in the sub-image. The vector (sometimes called an "image vector") is really nothing more than a matrix of values (intensity and/or color) reported by each pixel in the sub-image. For example, a black and white image of a house might be defined by a 600.times.600 pixel image, and a 6.times.6 square patch of pixels, representing, for example, a shadow, or part of a roof line against a light background, might form the sub-image from which the vector is constructed. The vector itself might be defined by a plurality of gray scale values representing the intensity reported by each pixel. While a black and white image serves as an example here, vectors might also be formed from red, green, or blue levels from a color image, or from the Y, I and Q components of a color image, or from transform coefficients of an image signal.
Numerous methods exist for manipulating the block, or sub-image, to form a vector. R. M. Gray, "Vector Quantization", IEEE ASSP Mag., pp. 4-29 (April, 1984), describes formation of vectors for monochrome images. E. B. Hilbert, "Cluster Compression Algorithm: A Joint Clustering/Data Compression Concept", Jet Propulsion Laboratory, Pasadena, CA, Publ. 77-43, describes formation of vectors from the color components of pixels. A. Gersho and B. Ramamurthi, "Image Coding Using Vector Quantization", Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 428-431 (May, 1982), describes vector formation from the intensity values of spatially contiguous groups of pixels. All of the foregoing references are incorporated herein by reference.
By way of example, a television camera might generate an analog video signal in a raster scan format having 600 scan lines per frame. An analog to digital converter could then digitize the video signal at a sampling rate of 600 samples per scan line. Digital signal processing equipment could then store the digital samples, and group them into vectors.
Before quantizing an image, a vector quantizer stores a set of "codevectors" in a memory called a codebook. Codevectors are vectors which are chosen to be representative of commonly found image vectors. For example, one codevector might be a 6.times.6 pixel solid black patch. Another codevector might have all white pixels in the top three rows, and all black pixels in the bottom three rows. Yet another codevector might have a gradient made up of white pixels in the top row, black pixels in the bottom row, and four rows of pixels in between having shades of gray from light to dark. The quantizer stores a sufficient variety of codevectors in the codebook so that at least one closely matches each of the many image vectors that might be found in the full image. Typically, a codebook of representative codevectors is generated using an iterative clustering algorithm, such as described in S. P. Lloyd, "Least Squares Optimization in PCM", Bell Lab. Tech. Note, (1957) (also found in IEEE Trans. Inform. Theory, Vol. IT-28, pp. 129-137, March (1982); and, J. T. Tou and R. C. Gonzalez, "Pattern Recognition Principles", pp. 94-109, Addison-Wesley, Reading, MA (1974). Both of these references are incorporated herein by reference.
Each codevector is assigned a unique identification code, sometimes called a label. In practice, the identification codes, or labels, are the memory addresses where the closest codevector to the image vector is found. (In the appended claims, the term "ID code" is sometimes employed to refer to these labels or addresses,) Compression is achieved by replacing the codevector in the codebook which most closely matches the image vector by the label, or memory address.
By way of example, the codevector having the solid black patch described above, might be assigned address #1. The codevector having the white pixels in the top half and black pixels in the bottom half might be assigned address #2, and so on for hundreds or thousands of codevectors. When quantizing a full image, a vector quantizer divides the full image frame into a series of image vectors. For each image vector, the vector quantizer identifies one closely matching codevector. The vector quantizer then generates a new signal made up of the series of labels, or memory addresses where the codevectors were found in the codebook. For the example of a full image of a house, the vector quantizer would divide the full image into numerous image vectors. The quantizer might then replace image vectors from shadowed areas with address #1 (the solid black patch), and it might replace the roof line image vectors with address #2 (white in the top half and black in the bottom half). Compression results because, typically, the length of the labels or addresses is much smaller than the size of the codevectors stored in memory. Typically, the addresses are transmitted by any conventional technique so that the image can be reconstructed at the receiver.
Reconstruction of the original full image at the receiver (or at least a very close approximation of the original image) may be accomplished by a device which has a codebook, identical to the codebook at the transmitter end, stored in a memory. Usually, the device that performs vector quantization and compression at the transmitter is called an encoder, and the device that performs decompression and image reproduction at the receiving end is called a decoder. The decoder reconstructs (at least an approximation of) the original image by retrieving from the codebook in the decoder the codevectors stored at each received address. Generally, the reconstructed image differs somewhat from the original image because codevectors do not usually precisely match the image vectors. The difference is called "distortion." Increasing the size of the codebook generally decreases the distortion.
Many different techniques for searching a codebook to find the codevector that best matches the image vector have been proposed, but generally the methods can be classified as either a full search technique, or a branching (or tree) search technique. In a full search technique, the vector quantizer sequentially compares an input image vector to each and every codevector in the codebook. The vector quantizer computes a measure of distortion for each codevector and selects the one having the smallest distortion. The full search technique ensures selection of the best match, but involves the maximum number of computational steps. Thus, while distortion can be minimized using a full search technique, it is computationally expensive. Y. Linde, A. Buzo and R. Gray, "An Algorithm For Vector Quantizer Design", IEEE Transactions on Communications, Vol. COM-28, No. 1 (January 1980), incorporated herein by reference, describes the full search technique and the computational steps involved in such a search. The full search technique is sometimes called "full search vector quantization" or "full search VQ".
The tree search technique reduces the number of codevectors that must be evaluated (and thus reduces search time), but generally does not guarantee that the minimum distortion vector will be selected. A tree search technique can be considered as one that searches a sequence of small codebooks, instead of one large codebook. The codebook structure can be depicted as a tree, and each search and decision corresponds to advancing one level or stage in the tree, starting from the root of the tree. A detailed description of the tree search technique may be found in R. M. Gray and H. Abut, "Full Search and Tree Searched Vector Quantization of Speech Waveforms," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 593-96 (May 1982), and R. M. Gray and Y. Linde, "Vector Quantization and Predictive Quantizers For Gauss Markov Sources", IEEE Trans. Comm., Vol. COM-30, pp. 381-389 (February 1982), both of which are incorporated herein by reference. The tree search technique is sometimes referred to as "tree-search vector quantization", "tree-search VQ" and "TSVQ." Tree-search VQ, this technique has found favor for compressing dynamic images, since it is computationally faster. However, tree-search VQ does not guarantee selection of the optimum vector, and therefore requires a larger codebook to achieve the same distortion as full search VQ.
The process of vector quantizing data can be either "fixed rate" or "variable rate." Fixed rate VQ occurs when all of the transmitted address data has the same length, and a vector address is transmitted for all vectors in the image. Generally speaking, variable rate VQ offers the advantage that the average rate at which VQ data is transmitted is less than the rate that would be experienced if transmission of fixed rate VQ data were employed for the same image at the same distortion level. In the context of pay television systems, this advantage can be significant, since it can represent a much greater increase in the number of channels that can be carried over existing media (such as satellite and cable) than would be realized if fixed rate VQ were employed.
Several techniques are available for implementing variable rate VQ. In one technique, the quantity of compressed data generated by an image depends on the image content. For example, a variable rate VQ system might employ two different vector sizes. A large vector size might be used to describe simple parts of the image, and a small vector size might be used to describe complex parts of the image. The amount of compressed data generated depends on the complexity of the image. Sung Ho and A. Gersho, "Variable Rate Multi-Stage Vector Quantization For Image Coding", University of California, Santa Barbara (1988) (Available as IEEE Publ. No. CH 2561-9 88 0000-1156) teach one such technique. This reference is incorporated herein by reference. A disadvantage of this type of variable rate VQ is that the decoder is always more complex than a fixed rate VQ decoder since the decoder requires a video buffer store to reconstruct the image, whereas a fixed rate VQ decoder does not.
Another variable rate VQ scheme is described in E. A. Riskin, "Variable Rate Vector Quantization of Images", Ph. D. Dissertation--Stanford University, pp. 51 et seq. (May, 1990), incorporated herein by reference. Riskin employs an "unbalanced" tree structured codebook. An "unbalanced" tree structure is simply an incomplete tree; in other words, some branches of the tree may extend to further levels of the tree than other branches. As is common in tree search VQ, Riskin's codebook is searched by advancing from level to level along selected branches. Encoding will occur at different levels of the tree (in part due to the unbalanced structure of the tree), thereby achieving variable rate VQ, since the address length is a direct function of the level from which a codevector is selected. One disadvantage of this system is that encoding is not adaptive in any sense, and therefore the Riskin system does not perform variable rate VQ in a most optimal fashion.
Copending U.S. patent application Ser. No. 794,516 entitled "Image Compression Method and Apparatus Employing Distortion Adaptive Tree Search Vector Quantization" describes one method for achieving high transmission rates through use of a variable rate VQ scheme that employs a distortion measure to determine the level of the tree from which codevectors will be selected for each input image vector. In general, in the invention disclosed in this application, simple parts of the image can be adequately reproduced by a short address indicating a vector near the root of the tree, while more complex parts of the image may require a vector at a lower level (i.e., closer to the bottom) of the tree, requiring a longer codebook address. In television images in particular, as well as other video images such as movies, etc, it has been observed that a rather high degree of image redundancy may exist from one image frame to the next. Moreover, it has been observed that, to the extent that subsequent image frames are not fully redundant, many portions of subsequent image frames may nonetheless be redundant. Still further, within an image frame, a large degree of coherence, and thus redundancy, may be found between adjacent portions of the frame. Thus, even lower transmission rates may be accomplished by deleting the redundant or nearly redundant vector quantized data from data to be transmitted to the decoder. In other words, there is no need to send vector quantized data for image vectors that are identical, or substantially similar to, image vectors for which vector quantization data has previously been transmitted, since the decoder may simply copy the relevant reproduced image vectors. The present invention adapts this recognition to a distortion adaptive tree search vector quantization method to substantially decrease the transmission rates that can be realized without perceptible distortion or degradation in the quality of reproduced images.