The background of the present invention is described herein in the context of pay television systems, such as cable television systems or direct broadcast satellite (DBS) systems, that distribute program material to subscribers, but the invention is by no means limited thereto except as expressly set forth in the accompanying claims.
In a typical cable television system, cable television operators receive much of their program material from remote earth station transmitters via a plurality of geosynchronous orbit satellites. The cable operator selects the program material to be made available to its subscribers by making arrangements with the satellite distributors of that program material. The cable operator receives the transmitted program material at its "cable head-end," where it then re-transmits the data to individual subscribers. Frequently, cable operators also provide their own local programming at the site of the head-end, and further include network broadcasts as well.
In a DBS system, individual subscribers are provided with their own satellite receiver. Each subscriber establishes a down-link with the broadcasting satellite directly. Thus, there is no need, as with cable systems, for re-transmission from a cable head-end.
Typically, in both types of systems (cable and DBS), the program material (both video and audio) is originally in analog form. Conventional transmission techniques place substantial limitations on the maximum number of viewer channels that can be transmitted over any given transponder on a satellite since each channel requires a minimum bandwidth to avoid noticeable degradation and the total number of channels that can be transmitted over a given satellite transponder is limited by the bandwidth of each signal, and of the transponder. Also, in cable systems, the electrical properties of the coaxial cable limit its bandwidth and therefore place substantial limitations on the number of channels that can be delivered to cable television subscribers using conventional transmission techniques.
As a result of the desire to provide more program channels and/or HDTV to subscribers over existing broadcast bandwidths, the pay television industry has begun to investigate digital image transmission techniques. Although the desire is to minimize the transmission bandwidth of program material, thus allowing more channels to be transmitted over an existing broadcast bandwidth, digital image transmission further offers the advantage that digital data can be processed at both the transmission and reception ends to improve picture quality. Unfortunately, the process of converting the program material from analog form to digital form results in data expansion which increases the transmission bandwidth of the program material rather than decreasing it. Therefore, digital transmission alone does not solve the bandwidth problem, but instead makes it worse. However, through the application of digital data compression techniques, large bandwidth reductions can be achieved.
Data compression techniques minimize the quantity of data required to represent each image. Thus, more program material, or more channels, can be offered over an existing broadcast bandwidth. However, any data compression achieved is offset by the data expansion which occurs during the analog to digital conversion. Therefore, to be practical, the compression technique employed must achieve a compression ratio high enough to provide a net data compression. Digital data compression techniques, such as Huffman encoding and LZW (Lempel, Ziv and Welch) encoding, offer, at best, compression ratios of 2.5 to 1 and do not compensate sufficiently for the data expansion that occurs in converting data from analog to digital form.
In response to the need for high compression ratios, a number of so-called "lossy" compression techniques have been investigated for digital image compression. Unlike the Huffmann and LZW encoding techniques, these "lossy" compression techniques do not provide exact reproduction of the data upon decompression. Thus, some degree of information is lost; hence the label "lossy." One such "lossy" compression technique is called DCT direct cosine transform) data compression. Another method, which, until recently, has been used principally for speech compression, is vector quantization. Vector quantization has shown promise in image compression applications by offering high image compression ratios, while also achieving high fidelity image reproduction at the receiving end. It has been demonstrated, for example, that using vector quantization (hereinafter sometimes referred to as "VQ"), compression ratios as high as 25:1, and even as high as 50:1, can be realized without significant visually perceptible degradation in image reproduction.
Compression of video images by vector quantization initially involves dividing the pixels of each image frame into smaller blocks of pixels, or sub-images, and defining a "vector" from relevant data (such as intensity and/or color) reported by each pixel in the sub-image. The vector (sometimes called an "image vector" or "input image vector") is really nothing more than a matrix of values (intensity and/or color) reported by each pixel in the sub-image. For example, a black and white image of a house might be defined by a 600.times.600 pixel image, and a 6.times.4 rectangular patch of pixels, representing, for example, a shadow, or part of a roof line against a light background, might form the sub-image from which the vector is constructed. The vector itself might be defined by a plurality of gray scale values representing the intensity reported by each pixel. While a black and white image serves as an example here, vectors might also be formed from red, green, or blue levels of a color image, or from the Y, I and Q components of a color image, or from transform coefficients of an image signal.
Numerous methods exist for manipulating the block, or sub-image, to form a vector. R. M. Gray, "Vector Quantization", IEEE ASSP Mag., pp. 4-29 (April, 1984), describes formation of vectors for monochrome images. E. B. Hilbert, "Cluster Compression Algorithm: A Joint Clustering/Data Compression Concept", Jet Propulsion Laboratory, Pasadena, Calif., Publ. 77-43, describes formation of vectors from the color components of pixels. A. Gersho and B. Ramamurthi, "Image Coding Using Vector Quantization", Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 428-431 (May, 1982), describes vector formation from the intensity values of spatially contiguous groups of pixels. All of the foregoing references are incorporated herein by reference.
By way of example, a television camera might generate an analog video signal in a raster scan format having 600 scan lines per frame. An analog to digital converter could then digitize the video signal at a sampling rate of 600 samples per scan line, each sample being a pixel. Digital signal processing equipment could then store the digital samples in a 600.times.600 pixel matrix. The 600.times.600 pixel matrix could then be organized into smaller blocks, for example 6.times.4 pixel blocks, and then each block could be converted into an image vector. Each of these image vectors would then be compressed as described below.
In an image vector quantizer, a vector quantization "codebook" is created from training data comprising a representative sample of images which the quantizer is likely to encounter during use. The codebook consists of a memory containing a set of stored "codevectors," each representative of commonly encountered image vectors. For example, one codevector might be a 6.times.4 pixel solid black patch. Another codevector might have all white pixels in the top three rows, and all black pixels in the bottom three rows. Yet another codevector might have a gradient made up of white pixels in the top row, black pixels in the bottom row, and four rows of pixels in between having shades of gray from light to dark. Typically, a codebook of representative codevectors is generated using an iterative clustering algorithm, such as described in S. P. Lloyd, "Least Squares Optimization in PCM", Bell Lab. Tech. Note, (1957) (also found in IEEE Trans. Inform. Theory, Vol. IT-28, pp. 129-137, March (1982); and, J. T. Tou and R. C. Gonzalez, "Pattern Recognition Principles", pp. 94-109, Addison-Wesley, Reading, Mass. (1974). Both of these references are incorporated herein by reference.
Each codevector in the codebook is assigned a unique identification code, sometimes called a label. In practice, the identification codes, or labels, are the memory addresses of the codevectors. (In the appended claims, the term "ID code" is sometimes employed to refer to these labels or addresses). For each input image vector, data compression is achieved by selecting the codevector in the codebook that most closely matches the input image vector, and then transmitting the codebook address of the selected codevector rather than the input image vector. Compression results because generally, the addresses of the selected codevectors are much smaller than the image vectors.
By way of example, the codevector having the solid black patch described above, might be assigned address #1. The codevector having the white pixels in the top half and black pixels in the bottom half might be assigned address #2, and so on for hundreds or thousands of codevectors. When quantizing a full image, a vector quantizer divides the full image frame into a series of image vectors (i.e., from each of the blocks of sub images). For each image vector, the vector quantizer identifies one closely matching codevector. The vector quantizer then generates a new signal made up of the series of labels, or memory addresses where the codevectors were found. For the example of a full image of a house, the vector quantizer would divide the full image into numerous image vectors (from each of the blocks, or sub images). The quantizer might then replace image vectors from shadowed areas with address #1 (the solid black patch), and it might replace the roof line image vectors with address #2 (white in the top half and black in the bottom half). As mentioned above, compression results because, typically, the length of the labels or addresses is much smaller than the size of the codevectors stored in memory. Typically, the addresses are transmitted by any conventional technique so that the image can be reconstructed at the receiver.
Reconstruction of the original full image at the receiver (or at least a very close approximation of the original image) may be accomplished by a device which has a codebook, identical to the codebook at the transmitter end, stored in a memory. The device that performs vector quantization and compression at the transmitter is called an encoder, and the device that performs decompression and image reproduction at the receiving end is called a decoder. The decoder reconstructs (at least an approximation of) the original image by retrieving from the codebook in the decoder the codevectors stored at each received address. Generally, the reconstructed image differs somewhat from the original image because codevectors do not usually precisely match the image vectors. The difference is called "distortion." Increasing the size of the codebook generally decreases the distortion.
Many different techniques for searching a codebook to find the codevector that best matches the image vector have been proposed, but generally the methods can be classified as either a full search technique, or a branching (or tree) search technique. In a full search technique, the vector quantizer sequentially compares an input image vector to each and every codevector in the codebook. The vector quantizer computes a measure of distortion for each codevector and selects the one having the smallest distortion. The full search technique ensures selection of the best match, but involves the maximum number of computational steps. Thus, while distortion can be minimized using a full search technique, it is computationally expensive. Y. Linde, A. Buzo and R. Gray, "An Algorithm For Vector Quantizer Design", IEEE Transactions on Communications, Vol. COM-28, No. 1 (January 1980), incorporated herein by reference, describes the full search technique and the computational steps involved in such a search. The full search technique is sometimes called "full search vector quantization" or "full search VQ".
The tree search technique reduces the number of codevectors that must be evaluated (and thus reduces search time), but does not necessarily identify the very best match. Consequently, to maintain a given level of distortion, the tree search technique requires a larger codebook than the full search technique. The tree search technique can be considered as one that searches a sequence of small codebooks, instead of one large codebook. The codebook structure can be depicted as a tree, and each search and decision corresponds to advancing along a branch of the tree to the next level or stage of the tree, starting from the root of the tree. Thus, only the codevectors along certain branches of the tree are searched, thereby reducing the search time. A detailed description of the tree search technique may be found in R. M. Gray and H. Abut, "Full Search and Tree Searched Vector Quantization of Speech Waveforms," Proc. IEEE Int. Conf Acoust., Speech, Signal Processing, pp. 593-96 (May 1982), and R. M. Gray and Y. Linde, "Vector Quantization and Predictive Quantizers For Gauss Markov Sources", IEEE Trans. Comm., Vol. COM- 30, pp. 381-389 (February 1982), both of which are incorporated herein by reference. The tree search technique is sometimes referred to as "tree-search vector quantization", "tree-search VQ" and "TSVQ." Notwithstanding the larger memory that is required to maintain a given level of distortion, this technique has found favor for compressing dynamic images because it is computationally faster.
Transmission of the VQ data (i.e., the codebook labels or addresses) to a receiver for reconstruction of the image can be either "fixed rate" or "variable rate." Fixed rate transmission occurs when all of the transmitted address data has the same length, and a vector address is transmitted for all vectors in the image. Fixed rate transmission is usually implicated when full search VQ has been employed. Fixed rate transmission may also be implicated when tree search VQ has been employed if the vector quantizer always selects a best match codevector from the same level of the tree (addresses at a given level of the tree have a fixed length). Fixed rate transmission is typically preferred when the VQ data (i.e., the addresses) is to be transmitted over a noisy medium, because variable rate signals are more sensitive to transmission errors, as described hereinafter.
As mentioned, tree structured codebooks can provide fixed rate transmission (by always encoding from the same level of the tree); however, tree-structured codebooks are particularly suited for variable rate transmission. Variable rate transmission occurs when the encoder selects codevectors from different levels of a tree structured codebook; in other words, for a given set of input image vectors, the encoder does not always select codevectors from the same level of the tree. Since the addresses associated with codevectors in higher levels of the tree usually are not as long as addresses associated with codevectors in lower levels of the tree, varying the level from which the codevectors are selected also varies the address length, and thus the length of the transmitted data.
Several techniques are available for implementing variable rate VQ. In one technique, the quantity of compressed data generated by an image depends on the image content. For example, a variable rate VQ system might employ two different vector sizes. A large vector size might be used to describe simple parts of the image, and a small vector size might be used to describe complex parts of the image. The amount of compressed data generated depends on the complexity of the image. Sung Ho and A. Gersho, "Variable Rate Multi-Stage Vector Quantization For Image Coding", University of California, Santa Barbara (1988) (Available as IEEE Publ. No. CH 2561-9 88 0000-1156) teach one such technique. This reference is incorporated herein by reference. A disadvantage of this type of variable rate VQ is that the decoder is always more complex than a fixed rate decoder since the decoder requires a video buffer store to reconstruct the image, whereas a fixed rate decoder does not.
Another variable rate transmission scheme is described in E. A. Riskin, "Variable Rate Vector Quantization of Images", Ph. D. Dissertation--Stanford University, pp. 51 et seq. (May, 1990), incorporated herein by reference. Riskin employs an "unbalanced" tree structured codebook. An "unbalanced" tree structure is simply an incomplete tree; in other words, some branches of the tree may extend to further levels of the tree than other branches. As is common in tree search VQ, Riskin's codebook is searched by advancing from level to level along selected branches. However, since Riskin employs an unbalanced tree, some searches will terminate at higher levels than others because some branches do not extend as far down the tree as others. Encoding will occur at different levels of the tree, thereby achieving variable rate transmission since the address length is a direct function of the level from which a codevector is selected. One disadvantage of this method is that because the branch lengths are fixed, users cannot increase the accuracy by encoding further down the tree if desired; encoding must always stop at the end of a branch, whether it extends to only a first level of the tree or a bottom level.
A more preferable implementation of variable rate VQ is disclosed in commonly assigned, co-pending U.S. application Ser. No. 794,516, filed Nov. 19, 1991, and entitled "Image Compression Method and Apparatus Employing Distortion Adaptive Tree Search Vector Quantization," which is incorporated herein by reference. Rather than employing an "unbalanced" tree, this method employs a "balanced" or full tree-structured codebook (i.e., where each branch extends all the way to the bottom of the tree). Variable rate transmission occurs because for each input vector, the level of the tree at which a matching codevector is selected depends upon whether a measure of distortion between the input vector and the best match codevector at a given level satisfies an adjustable threshold. If the distortion measure is less than the threshold, the codevector at that particular level is selected. However, if the distortion measure is greater than the threshold, then the best match codevector at the next level of the tree is selected, and the process is repeated until the distortion measure is less than the threshold, or until the last level of the tree has been reached. Different input vectors will be encoded at different levels of the tree, and therefore, variable rate transmission is achieved.
Generally speaking, variable rate transmission offers the advantage that the average rate at which VQ data is transmitted is less than the rate that would be experienced if fixed rate transmission were employed for the same image at the same distortion level. In the context of pay television systems, this advantage can be significant, since it can represent a much greater increase in the number of channels that can be carried over existing media (such as satellite and cable) than would be realized if fixed rate transmission were employed. As mentioned above, however, variable rate transmission is more susceptible to transmission errors than fixed rate transmission. Errors in variable rate transmissions make it difficult to parse the variable length addresses at the receiving end. A single uncorrected error can cause a relatively long sequence of data bits to be lost. Conversely, in fixed rate transmission, a single error typically affects only a single vector address.
The error rate in a given transmission is highly dependent on the transmission medium. For example, satellite transmissions are more susceptible to error than are coaxial cable transmissions. This is particularly significant in the context of pay television systems where cable operators receive program material via satellite and retransmit the material to subscribers via coaxial cable. Because satellite transmissions are notoriously noisy, pay television systems would benefit from the use of fixed rate transmission over the satellite. Cable operators, however, would rather employ variable rate transmission because of the higher compression ratios that can be achieved. Furthermore, because magnetic storage media are also notoriously noisy, subscribers wishing to record program material on a VCR can achieve a higher fidelity recording with fixed rate data. Thus, subscribers will want to transform the variable rate data back to fixed rate.
One possible solution for cable operators is to receive the fixed rate transmissions at a cable head-end, decode the information to reproduce the compressed images, and then perform variable rate vector quantization on the reproduced images for transmission to subscribers. This solution is disadvantageous, because vector quantization is performed twice (fixed rate at the source; variable rate at the cable head-end). Each time vector quantization is performed, more distortion is introduced. Furthermore, the cable operator must invest in a costly variable rate vector quantizer. Subscribers wishing to record program material on a VCR at a fixed rate.
Therefore, there is a need for a method of transforming fixed rate vector quantized data to variable rate and back again without the need for re-quantizing the data. Such a method would be particularly advantageous in the context of pay television systems since cable operators could receive fixed rate data over the satellite, transform the data to variable rate, and transmit the variable rate data to subscribers. Subscribers could then easily transform the variable rate data to fixed rate for recording on a VCR. The method and apparatus of the present invention satisfies this need.