The background of the present invention is described herein in the context of pay television systems, such as cable television systems or direct broadcast satellite (DBS) systems, that distribute program material to subscribers, but the invention is by no means limited thereto except as expressly set forth in the accompanying claims.
In a typical cable television system, cable television operators receive much of their program material from remote earth station transmitters via a plurality of geosynchronous orbit satellites. The cable operator selects the program material to be made available to its subscribers by making arrangements with the satellite distributors of that program material. The cable operator receives the transmitted program material at its "cable head-end," where it then re-transmits the data to individual subscribers. Frequently, cable operators also provide their own local programming at the site of the head-end, and further include network broadcasts as well.
In a DBS system, individual subscribers are provided with their own satellite receiver. Each subscriber establishes a down-link with the broadcasting satellite directly. Thus, there is no need, as with cable systems, for re-transmission from a cable head-end.
Typically, in both types of systems (cable and DBS), the program material (both video and audio) is originally in analog form. Conventional transmission techniques place substantial limitations on the maximum number of viewer channels that can be transmitted over any given transponder on a satellite since each channel requires a minimum bandwidth to avoid noticeable degradation and the total number of channels that can be transmitted over a given satellite transponder is limited by the bandwidth of each signal, and of the transponder. Also, in cable systems, the electrical properties of the coaxial cable limit its bandwidth and therefore place substantial limitations on the number of channels that can be delivered to cable television subscribers using conventional transmission techniques.
As a result of the desire to provide more program channels and/or HDTV to subscribers over existing broadcast bandwidths, the pay television industry has begun to investigate digital image transmission techniques. Although the desire is to minimize the transmission bandwidth of program material, thus allowing more channels to be transmitted over an existing broadcast bandwidth, digital image transmission further offers the advantage that digital data can be processed at both the transmission and reception ends to improve picture quality. Unfortunately, the process of converting the program material from analog form to digital form results in data expansion which increases the transmission bandwidth of the program material rather than decreasing it. Therefore, digital transmission alone does not solve the bandwidth problem, but instead makes it worse. However, through the application of digital data compression techniques, large bandwidth reductions can be achieved.
Data compression techniques minimize the quantity of data required to represent each image. Thus, more program material, or more channels, can be offered over an existing broadcast bandwidth. However, any data compression achieved is offset by the data expansion which occurs during the analog to digital conversion. Therefore, to be practical, the compression technique employed must achieve a compression ratio high enough to provide a net data compression. Digital data compression techniques, such as Huffman encoding and LZW (Lempel, Ziv and Welch) encoding, offer, at best, compression ratios of 2.5 to 1 and do not compensate sufficiently for the data expansion that occurs in converting data from analog to digital form.
In response to the need for high compression ratios, a number of so-called "lossy" compression techniques have been investigated for digital image compression. Unlike the Huffmann and LZW encoding techniques, these "lossy" compression techniques do not provide exact reproduction of the data upon decompression. Thus, some degree of information is lost; hence the label "lossy." One such "lossy" compression technique is called DCT (direct cosine transform) data compression. Another method, which, until recently, has been used principally for speech compression, is vector quantization. Vector quantization has shown promise in image compression applications by offering high image compression ratios, while also achieving high fidelity image reproduction at the receiving end. It has been demonstrated, for example, that using vector quantization (hereinafter sometimes referred to as "VQ"), compression ratios as high as 25:1, and even as high as 50:1, can be realized without significant visually perceptible degradation in image reproduction.
Compression of video images by vector quantization initially involves dividing the pixels, or samples, of each image frame into smaller blocks of pixels, or sub-images, and defining a "vector" from relevant data (such as intensity and/or color) reported by each pixel in the sub-image. The vector (sometimes called an "image vector" or "input image vector" or "input vector") is really nothing more than a matrix of values (intensity and/or color) reported by each pixel in the sub-image. For example, a black and white image of a house might be defined by a 600.times.600 pixel image, and a 4.times.4 rectangular patch of pixels, representing, for example, a shadow, or part of a roof line against a light background, might form the sub-image from which the vector is constructed. The vector itself might be defined by a plurality of gray scale values representing the intensity reported by each pixel. While a black and white image serves as an example here, vectors might also be formed from red, green, or blue levels of a color image, or from the Y, I and Q components of a color image, or from transform coefficients of an image signal. The terms "pixel" and "sample" may be used interchangeably herein to refer to the individual values or elements of a vector.
Numerous methods exist for manipulating the block, or sub-image, to form a vector. R.M. Gray, "Vector Quantization", IEEE ASSP Mag., pp. 4-29 (April, 1984), describes formation of vectors for monochrome images. E.B. Hilbert, "Cluster Compression Algorithm: A Joint Clustering/Data Compression Concept", Jet Propulsion Laboratory, Pasadena, Calif., Publ. 77-43, describes formation of vectors from the color components of pixels. A. Gersho and B. Ramamurthi, "Image Coding Using Vector Quantization", Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 428-431 (May, 1982), describes vector formation from the intensity values of spatially contiguous groups of pixels. All of the foregoing references are incorporated herein by reference.
By way of example, a television camera might generate an analog video signal in a raster scan format having 600 scan lines per frame. An analog to digital converter could then digitize the video signal at a sampling rate of 600 samples per scan line, each sample being a pixel. Digital signal processing equipment could then store the digital samples in a 600.times.600 pixel matrix. The 600.times.600 pixel matrix could then be organized into smaller blocks, for example 4.times.4 pixel blocks, and then each block could be converted into an image vector. Each of these image vectors would then be compressed as described below.
In an image vector quantizer, a vector quantization "codebook" is created from training data comprising a representative sample of images which the quantizer is likely to encounter during use. The codebook consists of a memory containing a set of stored "codevectors," each representative of commonly encountered image vectors. For example, one codevector might be a 4.times.4 pixel solid black patch. Another codevector might have all white pixels in the top two rows, and all black pixels in the bottom two rows. Yet another codevector might have a gradient made up of white pixels in the top row, black pixels in the bottom row, and two rows of pixels in between having shades of gray from light to dark. Typically, a codebook of representative codevectors is generated using an iterative clustering algorithm, such as described in S.P. Lloyd, "Least Squares Optimization in PCM", Bell Lab. Tech. Note, (1957) (also found in IEEE Trans. Inform. u Theory, Vol. IT-28, pp. 129-137, March (1982); and, J.T. Tou and R.C. Gonzalez, "Pattern Recognition Principles", pp. 94-109, Addison-Wesley, Reading, Mass. (1974). Both of these references are incorporated herein by reference.
Each codevector in the codebook is assigned a unique identification code, sometimes called a label. In practice, the identification codes, or labels, are the memory addresses of the codevectors. (In the appended claims, the term "ID code" is sometimes employed to refer to these labels or addresses). For each input image vector, data compression is achieved by selecting the codevector in the codebook that most closely matches the input image vector, and then transmitting the codebook address of the selected codevector rather than the input image vector. Compression results because generally, the addresses of the selected codevectors are much smaller than the image vectors.
By way of example, the codevector having the solid black patch described above, might be assigned address #1. The codevector having the white pixels in the top half and black pixels in the bottom half might be assigned address #2, and so on for hundreds or thousands of codevectors. When quantizing a full image, a vector quantizer divides the full image frame into a series of image vectors (i.e., from each of the blocks of sub images). For each image vector, the vector quantizer identifies one closely matching codevector. The vector quantizer then generates a new signal made up of the series of labels, or memory addresses where the codevectors were found. For the example of a full image of a house, the vector quantizer would divide the full image into numerous image vectors (from each of the blocks, or sub images). The quantizer might then replace image vectors from shadowed areas with address #1 (the solid black patch), and it might replace the roof line image vectors with address #2 (white in the top half and black in the bottom half). As mentioned above, compression results because, typically, the length of the labels or addresses is much smaller than the size of the codevectors stored in memory. Typically, the addresses are transmitted by any conventional technique so that the image can be reconstructed at the receiver.
Reconstruction of the original full image at the receiver (or at least a very close approximation of the original image) may be accomplished by a device which has a codebook, identical to the codebook at the transmitter end, stored in a memory. The device that performs vector quantization and compression at the transmitter is called an encoder, and the device that performs decompression and image reproduction at the receiving end is called a decoder. The decoder reconstructs (at least an approximation of) the original image by retrieving from the codebook in the decoder the codevectors stored at each received address. Generally, the reconstructed image differs somewhat from the original image because codevectors do not usually precisely match the image vectors. The difference is called "distortion." Increasing the size of the codebook generally decreases the distortion.
Many different techniques for searching a codebook to find the codevector that best matches the image vector have been proposed, but generally the methods can be classified as either a full search technique, or a branching (or tree) search technique. In a full search technique, the vector quantizer sequentially compares an input image vector to each and every codevector in the codebook. The vector quantizer computes a measure of distortion for each codevector and selects the one having the smallest distortion. The full search technique ensures selection of the best match, but involves the maximum number of computational steps. Thus, while distortion can be minimized using a full search technique, it is computationally expensive. Y. Linde, A. Buzo and R. Gray, "An Algorithm For Vector Quantizer Design", IEEE Transactions on Communications, Vol. COM-28, No. 1 (January 1980), incorporated herein by reference, describes the full search technique and the computational steps involved in such a search. The full search technique is sometimes called "full search vector quantization" or "full search VQ".
The tree search technique reduces the number of codevectors that must be evaluated (and thus reduces search time), but does not necessarily identify the very best match. Consequently, to maintain a given level of distortion, the tree search technique requires a larger codebook than the full search technique. The tree search technique can be considered as one that searches a sequence of small codebooks, instead of one large codebook. The codebook structure can be depicted as a tree, and each search and decision corresponds to advancing along a branch of the tree to the next level or stage of the tree, starting from the root of the tree. Thus, only the codevectors along certain branches of the tree are searched, thereby reducing the search time. A detailed description of the tree search technique may be found in R.M. Gray and H. Abut, "Full Search and Tree Searched Vector Quantization of Speech Waveforms," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 593-96 (May 1982), and R.M. Gray and Y. Linde, "Vector Quantization and Predictive Quantizers For Gauss Markov Sources", IEEE Trans. Comm., Vol. COM-30, pp. 381-389 (February 1982), both of which are incorporated herein by reference. The tree search technique is sometimes referred to as "tree-search vector quantization", "tree-search VQ" and "TSVQ." Notwithstanding the larger memory that is required to maintain a given level of distortion, this technique has found favor for compressing dynamic images because it is computationally faster.
The construction and use of tree structured codebooks to perform vector quantization is described in the aforementioned article by R.M. Gray entitled "Vector Quantization", and in R.L. Baker, "Vector Quantization of Digital Images", Ph.D. Dissertation, Stanford University, Department of Electrical Engineering (1984). See also E.A. Riskin, "Variable Rate Vector Quantization of Images", Ph. D. Dissertation--Stanford University, pp. 51 et seq. (May, 1990); U.S. Pat. No. 4,878,230 of Murakami et al. entitled, "Amplitude Adaptive Vector Quantization System"; and the aforementioned article by Linde, Buzo and Gray entitled "An Algorithm for Vector Quantizer Design."All of these reference are incorporated herein by reference.
Another well known technique that may be employed in VQ systems is referred to as mean-removed VQ (MRVQ). MRVQ is well known and has numerous advantages. See, for example, the aforementioned article by R.M. Gray entitled "Vector Quantization" and the aforementioned Ph.D. dissertation by R.L. Baker entitled "Vector Quantization of Digital Images". With MRVQ, prior to searching the codebook for a best match codevector, the encoder determines the scalar mean value of the input vector (i.e., the mean value of all pixels in the vector) and then subtracts the mean value from each pixel (sample) in the vector. The vector resulting from this subtraction, i.e., the input vector with its mean value removed, is referred to as a "residual" vector. It is the residual vector that is then compared to the codevectors in the codebook to find the best match. Thus, the encoder codebook contains representative "residual" codevectors. After selecting a best match "residual" codevector, the encoder transmits both the address of the residual codevector and the mean value of the original input vector to the decoder. At the decoder, the input vector is reconstructed by retrieving from the decoder codebook (which is identical to the encoder codebook) the "residual" codevector residing at the received address and adding the received mean value to the retrieved codevector.
One advantage of MRVQ in particular is that for some input vectors, only the mean value need be transmitted to the decoder in order to substantially reconstruct the input vector at the decoder. For example, input vectors in uniform areas of an image frame, such as a "shaded" area or uniform background, will contain pixel values that are substantially equal. The mean value of a "shade" vector will not be substantially different than any single pixel value in the vector. The mean value can therefore be used to represent the entire input vector; that is, the input vector can be substantially reconstructed at the decoder by simply constructing a vector that has pixel (sample) values each equal to the mean value of the original "shade" input vector. Thus, at the encoder, only the mean value need be transmitted to the decoder for that "shade" input vector.
To determine whether an input vector can be approximated solely from its mean value, a measure of difference can be obtained between the input vector and its mean value, and the measure of difference can be compared to a threshold. If the measure of difference satisfies the threshold, then only the mean value is transmitted to the decoder for that input vector. If, however, the measure of difference exceeds the threshold then the mean value is subtracted from the input vector, the residual vector is vector quantized, and both the mean value and the codevector address resulting from the VQ process are transmitted to the decoder. When the decoder detects that a mean only was transmitted for a given input vector, it reconstructs that input vector from the mean value only; that is, the decoder approximates the input vector with a vector having pixel (sample) values all equal to the mean value (thus the approximation is a uniform vector). Such a method is disclosed in commonly assigned, co-pending U.S. application Ser. No. 794,516, filed Nov. 19, 1991, and entitled "Image Compression Method and Apparatus Employing Distortion Adaptive Tree Search Vector Quantization."
Transmitting only the mean values of "shade" vectors has at least two advantages. First, it increases the compression ratio of the VQ system. Second, the VQ codebook can be trained to concentrate on the high frequency detail of the image frame. However, there are some disadvantages to approximating "shade" vectors with their mean values.
Approximating "shade" vectors with their mean values at the encoder likely will lead to a visual effect known as "blocking" when the image frame is reconstructed at the decoder. Although shaded areas of the original image frame may appear fairly uniform to the human visual system, there may be slight variations over the area. Thus, the mean values of adjacent "shade" vectors may differ slightly. Approximating these "shade" vectors with uniform mean value vectors exaggerates the slight variations at the boundaries of the adjacent vectors. Thus, the human visual system perceives a blocking or checkerboard effect in the reconstructed image.
One method which has been suggested to reduce the blocking effect, involves applying a bi-linear (two-dimensional) interpolator to sub-samples in an image frame to obtain an approximation to a low-pass version of the image frame. Such a method is disclosed in R.L. Baker and Hsiao-hui Shen, "A Finite State/Frame Difference Interpolative Vector Quantizer For Low Rate Image Sequence Coding," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, vol. 2, pp. 1188-1191 (April 1988). According to this method, for each input vector in the original image frame, the interpolator determines a low-frequency vector from the sub-samples. The low-frequency vector is an approximation to a low-pass version of the input vector. The low-pass version of each input vector is then subtracted from the input vector leaving a "residual" vector which approximates a high-pass version of the input vector. These residual vectors are then vector quantized and the codevector addresses produced by the quantizer are transmitted to the decoder along with the sub-samples used by the interpolator to generate the low-frequency version of each vector. At the decoder, the "residual" codevectors residing at the received addresses are retrieved from an identical codebook, and the low-pass version of each input vector is reconstructed from the received sub-samples using an interpolator identical to the one at the encoder. The low-pass version of each input vector is then added to the corresponding residual codevector to obtain a representation of the original input vector.
The main advantage of low-frequency removal over mean removal is that rather than approximating a "shade" input vector with the mean value of the vector, which results in substantial "blocking," the "shade" vector can be approximated by its low-pass version. As mentioned above, at the decoder, the low-pass version of the "shade" vector is reconstructed by applying the decoder interpolator to the received sub-samples. The reconstructed low-pass version is a better approximation to the "shade" vector than a mean value approximation because it is a vector valued prediction rather than a scalar one. Use of the low-pass version as an approximation to a "shade" vector, rather than a mean value approximation, reduces blocking.
However, although low-frequency removal/reconstruction reduces blocking, removal of the low-frequency component from each input vector introduces a d.c. offset in the residual vectors. As a result, the VQ process is more difficult because the vector quantizer must deal with these offsets.
Consequently there is a need for a method which allows for removal of the low-frequency component of each input vector as well as the d.c. offset of each residual vector, but which requires no additional information at the decoder to reconstruct the input vectors; that is, the input vectors can still be fully reconstructed using only the sub-samples and residual vector addresses received at the decoder. The present invention satisfies this need.