The background of the present invention is described herein in the context of pay television systems, such as cable television systems or direct broadcast satellite (DBS) systems, that distribute program material to subscribers, but the invention is by no means limited thereto except as expressly set forth in the accompanying claims.
In a typical cable television system, cable television operators receive much of their program material from remote earth station transmitters via a plurality of geosynchronous orbit satellites. The cable operator selects the program material to be made available to its subscribers by making arrangements with the satellite distributors of that program material. The cable operator receives the transmitted program material at its "cable head-end," where it then re-transmits the data to individual subscribers. Frequently, cable operators also provide their own local programming at the site of the head-end, and further include network broadcasts as well.
In a DBS system, individual subscribers are provided with their own satellite receiver. Each subscriber establishes a down-link with the broadcasting satellite directly. Thus, there is no need, as with cable systems, for re-transmission from a cable head-end.
Typically, in both types of systems (cable and DBS), the program material (both video and audio) is originally in analog form. Conventional transmission techniques place substantial limitations on the maximum number of viewer channels that can be transmitted over any given transponder on a satellite, since each channel requires a minimum bandwidth to avoid noticeable degradation and the total number of channels that can be transmitted over a given satellite transponder is limited by the bandwidth of the satellite transponders. In cable systems, the electrical properties of the coaxial cable and associated amplifiers limit its bandwidth and therefore phase substantial limitations on the number of channels that can be delivered to cable television subscribers using conventional transmission techniques.
As a result of the desire to provide more program channels and/or HDTV (high definition television) to viewers over existing broadcast bandwidths, the industry (most noticably, the cable television industry) has begun to investigate digital image transmission techniques. Although the desire is to minimize the transmission bandwidth of program material, thus allowing more channels to be transmitted over an existing broadcast bandwidth, digital image transmission further offers the advantage that digital data can be processed at both the transmission and reception ends to improve picture quality. Unfortunately, the process of converting the program material from analog form to digital form results in data expansion which increases the transmission bandwidth of the program material rather than decreasing it. Therefore, digital transmission alone does not solve the bandwidth problem, but instead makes it worse. However, through the application of digital data compression techniques, large bandwidth reductions can be achieved.
Data compression techniques minimize the quantity of data required to represent each image. Thus, more program material, or more channels, can be offered over an existing broadcast bandwidth. However, any data compression achieved is offset by the data expansion which occurs during the analog to digital conversion. Therefore, to be practical, the compression technique employed must achieve a compression ratio large enough to provide a net data compression. Digital data compression techniques, such as Huffman encoding and LZW (Lempel, Ziv and Welch) encoding, offer, at best, compression ratios of 2.5 to 1 and do not compensate sufficiently for the data expansion that occurs in converting data from analog to digital form.
In response to the need for large compression ratios, a number of so-called "lossy" compression techniques have been investigated for digital image compression. Unlike the Huffman and LZW encoding techniques, these "lossy" compression techniques do not provide exact reproduction of the data upon decompression. Thus, some degree of information is lost; hence the label "lossy." One such "lossy" compression technique is called DCT (discrete cosine transform) data compression. Another method, which, until recently, has been used principally for speech compression, is vector quantization. Vector quantization has shown promise in image compression applications by offering high image compression rates, while also achieving high fidelity image reproduction at the receiving end. It has been demonstrated, for example, that using vector quantization (hereinafter sometimes referred to as "VQ"), compression ratios as high as 25:1, and even as high as 50:1, can be realized without significant visually perceptible degradation in image reproduction.
Compression of video images by vector quantization initially involves dividing the pixels of each image frame into smaller blocks of pixels, or sub-images, and defining a "vector" from relevant data (such as intensity and/or color) reported by each pixel in the subimage. The vector (sometimes called an "image vector") is really nothing more than a matrix of values (intensity and/or color) reported by each pixel in the sub-image. For example, a black and white image of a house might be defined by a 600.times.600 pixel image, and a 6.times.4 rectangular patch of pixels, representing, for example, a shadow, or part of a roof line against a light background, might form the sub-image from which the vector is constructed. The vector itself might be defined by a plurality of gray scale values representing the intensity reported by each pixel. While a black and white image serves as an example here, vectors might also be formed from red, green, or blue levels of a color image, or from the Y, I and Q components of a color image, or from transform coefficients of an image signal.
Numerous methods exist for manipulating the block, or sub-image, to form a vector. R. M. Gray, "Vector Quantization", IEEE ASSP Mag., pp. 4-29 (April, 1984), describes formation of vectors for monochrome images. E. B. Hilbert, "Cluster Compression Algorithm: A Joint Clustering/Data Compression Concept", Jet Propulsion Laboratory, Pasadena, Calif., Publ. 77-43, describes formation of vectors from the color components of pixels. A. Gersho and B. Ramamurthi, "Image Coding Using Vector Quantization", Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 428-431 (May, 1982), describes vector formation from the intensity values of spatially contiguous groups of pixels. All of the foregoing references are incorporated herein by reference.
By way of example, a television camera might generate an analog video signal in a raster scan format having 600 scan lines per frame. An analog to digital converter could then digitize the video signal at a sampling rate of 600 samples per scan line, each sample being a pixel. Digital signal processing equipment could then store the digital samples in a 600.times.600 pixel matrix. The 600.times.600 pixel matrix could then be organized into smaller blocks, for example 6.times.4 pixel blocks, and then each block could be converted into an image vector. Each of these image vectors would then be compressed as described below.
In an image vector quantizer, a vector quantization "codebook" is created from training data comprising a representative sample of images which the quantizer is likely to encounter during use. The codebook consists of a memory containing a set of stored "codevectors," each representative of commonly encountered image vectors. For example, one codevector might be a 6.times.4 pixel solid black patch. Another codevector might have all white pixels in the top three rows, and all black pixels in the bottom three rows. Yet another codevector might have a gradient made up of white pixels in the top row, black pixels in the bottom row, and four rows of pixels in between having shades of gray from light to dark. Typically, a codebook of representative codevectors is generated using an iterative clustering algorithm, such as described in S. P. Lloyd, "Least Squares Optimization in PCM", Bell Lab. Tech. Note, (1957) (also found in IEEE Trans. Inform. Theory, Vol. IT-28, pp. 129-137, March (1982); and, J. T. Tou and R. C. Gonzalez, "Pattern Recognition Principles", pp. 94-109, Addison-Wesley, Reading, Mass. (1974). Both of these references are incorporated herein by reference. See also, Y. Linde, A. Buzo and R. Gray, "An Algorithm For Vector Quantizer Design", IEEE Transactions on Communications, Vol. COM-28, No. 1 (January 1980), incorporated herein by reference.
Each codevector in the codebook is assigned a unique identification code, sometimes called a label. In practice, the identification codes, or labels, are the memory addresses of the codevectors. For each input image vector, data compression is achieved by selecting the codevector in the codebook that most closely matches the input image vector, and then transmitting the codebook address of the selected codevector rather than the input image vector itself. Compression results because generally, the addresses of the selected codevectors are much smaller than the image vectors.
By way of example, the codevector having the solid black patch described above, might be assigned address #1. The codevector having the white pixels in the top half and black pixels in the bottom half might be assigned address #2, and so on for hundreds or thousands of codevectors. When quantizing a full image, a vector quantizer divides the full image frame into a series of image vectors. For each image vector, the vector quantizer identifies one closely matching codevector. The vector quantizer then generates a new signal made up of the series of labels, or memory addresses where the codevectors were found. For the example of a full image of a house, the vector quantizer would divide the full image into numerous image vectors. The quantizer might then replace image vectors from shadowed areas with address #1 (the solid black patch), and it might replace the roof line image vectors with address #2 (white in the top half and black in the bottom half). As mentioned above, compression results because, typically, the length of the labels or addresses is much smaller than the size of the codevectors stored in memory. Typically, the addresses are transmitted by any conventional technique so that the image can be reconstructed at the receiver.
Reconstruction of the original full image at the receiver (or at least a very close approximation of the original image) may be accomplished by a device which has a codebook, identical to the codebook at the transmitter end, stored in a memory. The device that performs vector quantization and compression at the transmitter is called an encoder, and the device that performs decompression and image reproduction at the receiving end is called a decoder. The decoder reconstructs (at least an approximation of) the original image by retrieving from the codebook in the decoder the codevectors stored at each received address. Generally, the reconstructed image differs somewhat from the original image because codevectors do not usually precisely match the image vectors. The difference is called "distortion." Increasing the size of the codebook generally decreases the distortion.
Many different techniques for searching a codebook to find the codevector that best matches the image vector have been proposed, but generally the methods can be classified as either a full search technique, or a branching (or tree) search technique. In a full search technique, the vector quantizer sequentially compares an input image vector to each and every codevector in the codebook. The vector quantizer computes a measure of distortion for each codevector and selects the one having the smallest distortion. The full search technique ensures selection of the best match, but involves the maximum number of computational steps. Thus, while distortion can be minimized using a full search technique, it is computationally expensive. The aforementioned article by Y. Linde, A. Buzo and R. Gray entitled "An Algorithm For Vector Quantizer Design" describes the full search technique and the computational steps involved in such a search. The full search technique is sometimes called "full search vector quantization" or "full search VQ".
The tree search technique reduces the number of codevectors that must be evaluated (and thus reduces search time), but does not necessarily identify the very best match. Consequently, to maintain a given level of distortion, the tree search technique requires a larger codebook than the full search technique. The tree search technique can be considered as one that searches a sequence of small codebooks, instead of one large codebook. The codebook structure can be depicted as a tree, and each search and decision corresponds to advancing along a branch to the next level or stage in the tree, starting from the root of the tree. Thus, only the codevectors along certain branches of the tree are searched, thereby reducing the search time. A detailed description of the tree search technique may be found in R. M. Gray and H. Abut, "Full Search and Tree Searched Vector Quantization of Speech Waveforms," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 593-96 (May 1982), and R. M. Gray and Y. Linde, "Vector Quantization and Predictive Quantizers For Gauss Markov Sources", IEEE Trans. Comm., Vol. COM-30, pp. 381-389 (February 1982), both of which are incorporated herein by reference. The tree search technique is sometimes referred to as "tree-search vector quantization", "tree-search VQ" and "TSVQ." Notwithstanding the larger memory that is required to maintain a given level of distortion, this technique has found favor for compressing dynamic images, since it is computationally faster.
The process of vector quantizing data can be either "fixed rate" or "variable rate." Fixed rate VQ occurs when all of the transmitted address data has the same length. Generally speaking, variable rate VQ offers the advantage that the average rate at which VQ data is transmitted is less than the rate that would be experienced if fixed rate VQ were employed for the same image at the same distortion level. In the context of pay television systems, this advantage can be significant, since it can represent a much greater increase in the number of channels that can be carried over existing media (such as satellite and cable) than would be realized if fixed rate VQ were employed.
Several techniques are available for implementing variable rate VQ. In one technique, the quantity of compressed data generated by an image depends on the image content. For example, a variable rate VQ system might employ two different vector sizes. A large vector size might be used to describe simple parts of the image, and a small vector size might be used to describe complex parts of the image. The amount of compressed data generated depends on the complexity of the image. Sung Ho and A. Gersho, "Variable Rate Multi-Stage Vector Quantization For Image Coding", University of California, Santa Barbara (1988) (Available as IEEE Publ. No. CH 2561-9 88 00001156) teach one such technique. This reference is incorporated herein by reference. A disadvantage of this type of variable rate VQ is that the decoder is always more complex than a fixed rate decoder since the decoder requires a video buffer store to reconstruct the image, whereas a fixed rate decoder does not.
Another variable rate VQ scheme is described in E. A. Riskin, "Variable Rate Vector Quantization of Images", Ph. D. Dissertation--Stanford University, pp. 51 et seq. (May, 1990), incorporated herein by reference. Riskin employs an "unbalanced" tree structured codebook. An "unbalanced" tree structure is simply an incomplete tree; in other words, some branches of the tree may extend to further levels of the tree than other branches. As is common in tree search VQ, Riskin's codebook is searched by advancing from level to level along selected branches. Encoding will occur at different levels of the tree, thereby achieving variable rate transmission since the address length is a direct function of the level from which a codevector is selected. One disadvantage of this method is that it is not distortion adaptive and variable rate VQ occurs, in part, as a result of the unbalanced nature of the tree.
A better implementation of variable rate VQ is disclosed in co-pending U.S. application Ser. No. 794,516 entitled "Image Compression Method and Apparatus Employing Distortion Adaptive Tree Search Vector Quantization," which is incorporated herein by reference. According to this application, codevectors are selected from varying levels of a tree structured codebook according to an adjustable threshold. One advantage of this implementation is that the threshold can be adjusted to alter the level of the tree at which encoding occurs, which in turn alters the average length of the addresses. Thus, this implementation is distortion adaptive.
The Riskin method permits transmission of increasingly accurate reproductions of input image data upon request by a recipient of the vector quantized image data. According to the Riskin method, each input vector of an image is first encoded at an initial level of the tree. The addresses of the codevectors selected for each input vector (from the initial level) are then transmitted. The encoder remembers the level at which each input vector was initially encoded. If the recipient requests greater accuracy, this is communicated to the encoder, and the encoder encodes each input vector at a next level of the tree, and the addresses of the codevectors selected from this level are transmitted. This process is repeated until either the recipient has received a satisfactory reproduction of the input image or until the encoder reaches the bottom level of the tree.
A disadvantage of Riskin's "progressive transmission" method is that the enhanced (i.e., more accurate) data is only transmitted when requested by the recipient. Moreover, Riskin's "progressive transmission" scheme is not adaptive in any sense of the word.
It is desirable to provide a progressive transmission method wherein image data recipients may choose the reproduction quality of transmitted image data without having to communicate with the encoder to select the quality level. Furthermore, it is desirable to provide such a progressive transmission technique that is adaptive and accounts for changes in image complexity so that, on average, the rate of transmission VQ datawords is substantially constant. The present invention achieves these goals.