The background of the present invention is described herein in the context of pay television systems, such as cable television systems or direct broadcast satellite (DBS) systems, that distribute program material to subscribers, but the invention is by no means limited thereto except as expressly set forth in the accompanying claims.
In a typical cable television system, cable television operators receive much of their program material from remote earth station transmitters via a plurality of geosynchronous orbit satellites. The cable operator selects the program material to be made available to its subscribers by making arrangements with the satellite distributors of that program material. The cable operator receives the transmitted program material at its "cable head-end," where it then re-transmits the data to individual subscribers. Frequently, cable operators also provide their own local programming at the site of the head-end, and further include network broadcasts as well.
In a DBS system, individual subscribers are provided with their own satellite receiver. Each subscriber establishes a down-link with the broadcasting satellite directly. Thus, there is no need, as with cable systems, for re-transmission from a cable head-end.
Typically, in both types of systems (cable and DBS), the program material (both video and audio) is originally in analog form. Conventional transmission techniques place substantial limitations on the maximum number of viewer channels that can be transmitted over any given transponder on a satellite since each channel requires a minimum bandwidth to avoid noticeable degradation and the total number of channels that can be transmitted over a given satellite transponder is limited by the bandwidth of each signal, and of the transponder. Also, in cable systems, the electrical properties of the coaxial cable and associated amplifiers limit its bandwidth and therefore place substantial limitations on the number of channels that can be delivered to cable television subscribers using conventional transmission techniques.
As a result of the desire to provide more program channels to subscribers over existing broadcast bandwidths, the pay television industry has begun to investigate digital image transmission techniques. Although the desire is to minimize the transmission bandwidth of program material, thus allowing more channels to be transmitted over existing media, digital image transmission further offers the advantage that digital data can be processed at both the transmission and reception ends to improve picture quality. Unfortunately, the process of converting the program material from analog form to digital form results in data expansion which increases the transmission bandwidth of the program material rather than decreasing it. Therefore, digital transmission alone does not solve the bandwidth problem, but instead makes it worse. However, through the application of digital data compression techniques, large bandwidth reductions can be achieved.
Data compression techniques minimize the quantity of data required to represent each image. Thus, more program material, or more channels, can be offered over an existing channel. However, any data compression achieved is offset by the data expansion which occurs during the analog to digital conversion. Therefore, to be practical, the compression technique employed must achieve a compression ratio large enough to provide a net data compression. Digital data compression techniques, such as Huffman encoding and LZW (Lempel, Ziv and Welch) encoding, offer, at best, compression ratios of 2.5 to 1 and do not sufficiently compensate for the amount of data expansion that occurs in converting data from analog to digital form.
In response to the need for large compression ratios, a number of so-called "lossy" compression techniques have been investigated for digital image compression. Unlike the Huffman and LZW encoding techniques, these "lossy" compression techniques do not provide exact reproduction of the data upon decompression. Thus, some degree of information is lost; hence the label "lossy." One such "lossy" compression technique is called DCT (discrete cosine transform) data compression. Another method, which, until recently, has been used principally for speech compression, is vector quantization. Vector quantization has shown promise in image compression applications by offering high image compression rates, while also achieving high fidelity image reproduction at the receiving end. It has been demonstrated, for example, that using vector quantization (hereinafter sometimes referred to as "VQ"), compression rates as high as 25:1, and even as high as 50:1, can be realized without significant visually perceptible degradation in image reproduction.
Compression of Video images by vector quantization involves dividing the pixels of each image frame into smaller blocks of pixels, or sub-images, and defining a "vector" from relevant data (such as intensity and/or color) reported by each pixel in the sub-image. The vector (sometimes called an "image vector") is really nothing more than a matrix of values (intensity and/or color) reported by each pixel in the sub-image. For example, a black and white image of a house might be defined by a 600.times.600 pixel image, and a 6.times.4 rectangular patch of pixels, representing, for example, a shadow, or part of a roof line against a light background, might form the sub-image from which the vector is constructed. The vector itself might be defined by a plurality of gray scale values representing the intensity reported by each pixel. While a black and white image serves as an example here, vectors might also be formed from red, green, or blue levels of a color image, or from the Y, I and Q components of a color image, or from transform coefficients of an image signal.
Numerous methods exist for manipulating the block, or sub-image, to form a vector. R. M. Gray, "Vector Quantization", IEEE ASSP Mag., pp. 4-29 (April, 1984), describes formation of vectors for monochrome images. E. B. Hilbert "Cluster Compression Algorithm: A Joint Clustering/Data Compression Concept", Jet Propulsion Laboratory, Pasadena, Calif., Publ. 77-43, describes formation of vectors from the color components of pixels. A. Gersho and B. Ramamurthi, "Image Coding Using Vector Quantization", Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 428-431 (May, 1982), describes vector formation from the intensity values of spatially contiguous groups of pixels. All of the foregoing references are incorporated herein by reference.
By way of example, a television camera might generate an analog video signal in a raster scan format having 600 scan lines per frame. An analog to digital converter could then digitize the video signal at a sampling rate of 600 samples per scan line, each sample being a pixel. Digital signal processing equipment could then score the digital samples in a 600.times.600 pixel matrix. The 600.times.600 pixel matrix could then be organized into smaller blocks, for example 6.times.4 pixel blocks, and then each block could be converted to an input vector.
In an image vector quantizer, a vector quantization "codebook" is created from training data comprising a representative sample of images which the quantizer is likely to encounter during use. The codebook consists of a memory containing a set of stored "codevectors," each representative of commonly encountered image vectors. For example, one codevector might be a 6.times.4 pixel solid black patch. Another codevector might have all white pixels in the top three rows, and all black pixels in the bottom three rows. Yet another codevector might have a gradient made up of white pixels in the top row, black pixels in the bottom row, and four rows of pixels in between having shades of gray from light to dark. Typically, a codebook of representative codevectors is generated using an iterative clustering algorithm, such as described in S. P. Lloy, "Least Squares Optimization in PCM", Bell Lab. Tech. Note (1957) (also found in IEEE Trans. Inform. Theory, Vol. IT-28, pp. 129-127, March (1982); and, J. T. Tou and R. C. Gonzelez, "Pattern Recognition Principles", pp. 94-109, Addison-Wesley, Reading, Mass. (1974). Both of these references are incorporated herein by reference.
Each codevector in the codebook is assigned a unique identification code, sometimes called a label. In practice, the identification codes, or labels, are the memory addresses of the codevectors. For each input image vector, data compression is achieved by selecting the codevector in the codebook that most closely matches the input image vector, and then transmitting the codebook address of the selected codevector rather than the input image vector itself. Compression results because generally, the addresses of the selected codevectors are much smaller than the image vectors. At the receiving end, an identical codebook is provided. Data recovery is achieved by accessing the receiver codebook with the transmitted address to obtain the selected codevector. Because the selected codevector closely resembles the original input vector, the input vector is substantially reproduced at the receiver. The reproduced input vector can then be converted back to the block of pixels that it represents. Thus, in this manner, an entire image can be reconstructed at the receiver.
Some distortion of the original image does result, however, due to inexact matches between the input vectors and the selected codevectors. Remember, the codevectors in the codebook are only a representative sample of possible input vectors, and therefore, exact matches rarely occur during actual use of the quantizer. Increasing the size of the codebook used for compression and decompression generally decreases the distortion. Unfortunately, increasing the size of the codebook is disadvantageous because the cost of memory can be prohibitive. Typical codebooks already contain a large number of representative codevectors and require a large amount of memory. Memory increases often are not affordable. Consequently, there is a need for a vector quantization method decreases distortion without an increase in codebook memory.
One prior art method can be used to satisfy this need. The method, referred to as "reflected VQ", is disclosed in R. L. Baker, "Vector Quantization of Digital Images", Ph.D. Dissertation, Stanford University, Department of Electrical Engineering 153-62 (1984), which is incorporated herein by reference. With the "reflected VQ" method, a given level of distortion can be maintained with a smaller number of representative codevectors, and therefore, less codebook memory is required. The memory saved by employing the "reflected VQ" method can then be used to hold more representative codevectors. As mentioned above, a greater number of representative codevectors will reduce the overall distortion of reproduced images. Therefore, with "reflected VQ," a decrease in distortion can be achieved with no net increase in codebook memory.
Briefly, the "reflected VQ" method takes advantage of the symmetry commonly found in image data. For example, note that, in the case of entertainment television, the mirror image on the Y axis of a television picture is often (but not always) another valid image. Symmetry about the x-axis is also sometimes present, but somewhat less so. As a result of this symmetry, a typical two-dimensional image frame often will contain sub-images (i.e., smaller blocks of pixels) that are substantial mirror images of each other about either the x-axis, the y-axis or both. Consequently, a single vector can represent each of these sub-images by simply mirroring the vector accordingly. When the sub-images are rectangular, one vector can represent up to four symmetrical sub-images by mirroring the vector about the x-axis, or the y-axis or both. When the sub-images are square, one vector can represent up to eight symmetrical sub-images because in addition to mirroring about the x-axis and y-axis, the vector can be rotated 90 degrees.
"Reflected VQ" takes advantage of the symmetry described above to achieve a reduction in codebook size. For example, consider a vector quantizer that organizes image frames into rectangular sub-images. Because of image frame symmetry, four visually different sub-images in effect may be the same sub-image simply mirrored about the x-axis, y-axis or both. Consequently, an input vector constructed from one of these sub-images can be used to represent each of the other sub-images by mirroring the vector accordingly. Different mirror images of a vector are referred to as "orientations" of the vector. Thus, an input vector constructed from a rectangular sub-image has four possible orientations. Rather than employing a codebook that contains codevectors for all orientations of an input vector, "reflected VQ" employs a codebook that contains codevectors for only one input vector orientation; in other words the codebook contains codevectors all having one general orientation. Then, prior to comparing an input vector to the codebook, the input vector is re-oriented, if necessary, such that its orientation matches the general orientation of the codevectors in the codebook. After selecting the codevector which most closely resembles the re-oriented input vector, an indication of the address of the selected codevector is transmitted along with additional information specifying the "reflections"/"mirroring" necessary to recover the original orientation of the input vector.
At the receiving end, the selected codevector is retrieved from the receiver codebook and re-oriented to match the original orientation of the input vector, thereby reproducing the input vector in its original orientation. Thus, in this manner, four visually different input vectors (representing mirror images of the same rectangular subimage) can be reproduced at the receiving location using a single codevector. By storing codevectors in only a single orientation, approximately a 4-to-1 reduction in codebook size (or 8-to-1 if the vector is square) can be achieved assuming two-dimensional rectangular sub-images; when square sub-images are employed approximately an 8-to-1 reduction in codebook size can be achieved. If desired, the memory saved can be used to hold more representative codevectors, resulting in an overall decrease in image distortion. Thus, with no net increase in memory, image distortion can be reduced.
Although "Reflected VQ" does achieve a reduction in distortion without a net increase in codebook size, the computational complexity of VQ encoders employing this method increases. A VQ encoder employing the "reflected VQ" method must be capable of determining the general orientation of each input vector so that the encoder can re-orient the input vector, if necessary, to match the general orientation of the codevectors in the codebook. This additional complexity increases the cost of VQ encoders employing the "reflected VQ" method. Therefore, there is a need for a vector quantization method that achieves a reduction in overall distortion without an increase in codebook memory and without a substantial increase in decoder cost.
The method of the present invention satisfies this need.