This invention relates to a method for producing an embedded bit stream in a hierarchical table lookup vector quantizer for use in connection with image data compression wherein an image is encoded at a transmitter, transmitted, and/or selectively decoded at a receiver. An embedded bit stream is one in which any prefix of the bit stream is a valid bit stream of the image at a lower rate. The present invention provides transcoding to bit streams of arbitrary lower rate, simply by truncation. Thus, image data can be transmitted in an embedded bit stream and the receiver can use as much of the embedded bit stream as is necessary to reconstruct the image to a desired or allowable resolution. That is, the invention implements progressive decoding of bit streams to images with increasing resolution as bits arrive at the receiver.
While the invention is particularly directed to the art of image data compression, and will thus be described with specific reference thereto, it will be appreciated that the invention has applicability to other media, such as video and audio. Exemplary applications to such media include packet priorization for bandwidth scalability in networks, information prioritization for unequal error protection for wireless communication, and arbitrarily fine bit rate control at the encoder.
By way of background, image and video coding standards such as JPEG, MPEG, and MPEG-II are based on transform coding. In transform coding, a block transform such as the discrete cosine transform (DCT) is typically applied to 8xc3x978 blocks of an image, and each transform coefficient or frequency band is independently quantized. The quantization stepsizes typically vary from band to band, with larger stepsizes in the higher frequency bands reflecting the fact that quantization errors at higher spatial frequencies are visually less apparent.
JPEG has a progressive transmission mode, in which an image is stored or transmitted with the most significant bits first, then the next most significant bits, and so on. This is accomplished by beginning with large stepsizes for each band, and successively halving the stepsizes in selected bands in some sequence. As each stepsize is halved, additional bits are stored or transmitted. The resulting bit stream is said to be embedded in the sense that any prefix of the bit stream can be decompressed into an image whose quality is appropriate for the length of the prefix. That is, lower resolution encodings are embedded in higher resolution encodings. Such an embedded bit stream is useful, for example, in image retrieval or telebrowsing applications in which the user must scan through a large collection of images to find a desired image, and it is too expensive or time-consuming to reproduce all the images in full detail. In these applications, the decoder can progressively reconstruct images of improving quality as bits arrive.
Most research on image and video coding algorithms still focuses on variants of transform coding. In particular, the transform is often based on wavelet analysis, subband filtering, or lapped orthogonal transforms, for which the transform blocks overlap, thereby reducing objectionable blocking artifacts.
J. M. Shapiro, xe2x80x9cEmbedded Image Coding Using Zerotrees of Wavelet Coefficients,xe2x80x9d IEEE Trans. on Signal Processing, Vol. 41, No. 17, pp. 3445-3463, December 1993, and D. Taubman and A. Zakhor, xe2x80x9cMulti-Rate 3-D Subband Coding of Video,xe2x80x9d IEEE Trans. Image Proc., Vol. 3, No. 5, pp. 572-588, September 1994, employ wavelet transform coding in a progressive mode. Taubman uses the resulting embedded bit stream for simple bit rate control in a video encoder. Bit rate control is essential when coupling a variable-rate data compressor to a fixed-rate channel. If the channel can transmit exactly (or a maximum of) B bits per second, then the bit rate of the compressor must be decreased or increased when the buffer between the compressor and channel begins to overflow or underflow, respectively. With an embedded bit stream, bit rate control becomes as simple as taking a prefix of an appropriate length from the encoding for each frame (or other unit) of the video data.
Vector quantization is a generalization of simple (scalar) quantization, in which each vector of coefficients is simultaneously quantized using a small number of bits, e.g. 8, to represent the vector as the index of one of, for example, 256 possible reproduction vectors (called codewords) in a collection of reproduction vectors (called the codebook). A special case of vector quantization is scalar quantization applied independently to each component of a vector. The codewords in this case are constrained to lie on a rectangular lattice. In vector quantization, this constraint is removed. Hence vector quantization is superior to scalar quantization in that it can arrange the codewords in the vector space for maximum coding efficiency. For example, the codewords can be arranged to populate only the probable regions of the space. This can be accomplished using a training set of typical data and a clustering algorithm. See A. Gersho and R. M. Gray, xe2x80x9cVector Quantization and Signal Compression,xe2x80x9d Kluwer Academic Publishers, 1992.
Not only do vector quantizers have superior rate-distortion performance; they also permit decoding by simple table lookup. When the decoder receives an 8 bit code, for example, it reproduces the original vector by using the code as the index to a table, and reading the reproduction vector out of the table. If all the coefficients of a block transform are blocked into a single vector, the decoder can decode the whole block with a simple table lookup. No inverse transform is necessary in this case, as it would be if the components of the transform were scalar quantized.
Unfortunately, there are some drawbacks to vector quantization. The first is encoder complexity. Since the codebook is unstructured, for each input vector, the encoder must perform a full search through the codebook looking for the codeword that would result in the lowest distortion. This is often excessively time-consuming for practical algorithms. A second related problem is that for a given bit rate (in bits per component), the number of bits per vector grows linearly with vector dimension, and hence the number of codewords grows exponentially with vector dimension (and hence so does encoder complexity, and encoder and decoder memory requirements). As a practical consequence, the vector dimension must match the bit rate. At low bit rates, large vector dimensions are feasible. At higher bit rates, only small dimensions may be feasible. At the highest bit rates, only scalar quantization may be feasible. At any given bit rate, the best rate-distortion performance will be attained with the largest feasible vector dimensions.
To reduce the computational complexity of searching the codebook of a vector quantizer, a number of alternatives have been developed. One of these is tree-structured vector quantization. In tree-structured vector quantization, the codewords are arranged in a tree structure (typically a binary tree), with one codeword at each node. Instead of searching the codebook exhaustively, it can be searched in a time logarithmic fashion by beginning at the root node, successively comparing the input vector to the codewords at each branch node, descending to the branch with the lower distortion, and repeating the process until reaching a leaf node. The codeword at the leaf is the desired reproduction vector and is represented by a binary code for the leaf, 8 bits to specify one of 256 leaves, for example. If the tree is complete, that is, if all leaves are at the same depth, (e.g. 8), then the path map to the leaf may be used as its index.
The path map has the embedded code property: any prefix represents a reproduction vector in a coarser codebook, i.e., at a higher level in the tree. Trees whose leaves are at varying depths result in a variable-rate code, in which the binary strings used to represent the reproduction vector vary in length depending on the input vector. Since fixed-rate codes are a special case of variable-rate codes, variable-rate codes can always do at least as well as fixed-rate codes (typically 20-30% better), at the cost of extra delay due to buffering, and reduced tolerance to bit errors.
The embedded coding property applies to variable-rate tree-structured vector quantizers as well as fixed-rate quantizers. E. A. Riskin, T. Lookabaugh, P. A. Chou, and R. M. Gray, xe2x80x9cVariable-Rate Vector Quantization for Medical Image Compression,xe2x80x9d IEEE Trans. Medical Imaging, Vol. 9, No. 3, pp. 290-298, September 1990; W.-J. Hwang and H. Derin, xe2x80x9cMulti-Resolution Multi-Rate Progressive Image Transmission,xe2x80x9d Proc. 27th Asilomar Conf. on Signals, Systems, and Computers, Nov. 1-3, 1993. Pacific Grove, Calif., pp. 65-69; and, M. Effros, P. A. Chou, E. A. Riskin, and R. M. Gray, xe2x80x9cA Progressive Universal Noiseless Coder,xe2x80x9d IEEE Trans. Information Theory, Vol. 40, No. 1, pp. 108-117, January 1994, explore variable-rate tree-structured vector quanitization for progressive image transmission. In the latter work, the tree-structured vector quantizer is structured so that the codewords lie on a rectangular lattice. As a result, the complete tree represents the input vector losslessly. Thus the progressive transmission sequence results in a lossless coding.
Another method developed for reducing the full search encoding complexity of unstructured vector quantization is hierarchical table-lookup vector quanitization, developed by P.-C. Chang, J. May, and R. M. Gray, xe2x80x9cHierarchical Vector Quantization with Table-Lookup Encoders,xe2x80x9d Proc. Int""l. Conf. on Communications, Chicago, Ill., June 1985, pp. 1452-55. In hierarchical table-lookup vector quantization, a hierarchy of moderately sized tables (e.g., 64 Kbytes) is used to perform the encoding. (Decoding is performed as usual in vector quantization, by a single table lookup.) For example, with reference FIG. 1a, to encode an 8-dimensional input vector into 8 bits, each component is first finely quantized to 8 bits, or one byte (this is usually already the case in image coding). Then each of the 4 adjacent pairs of bytes is used to address a 16-bit table (Table 1), to produce one byte for each pair. This process is repeated in two additional levels of the hierarchy, resulting in a single byte for the 8-dimensional input vector.
Each Table 1, 2 and 3 is 64 Kbytes. The number of tables is the logarithm of the vector dimension. The computational complexity of the scheme is at most one table lookup per input symbol, since the complexity of the first level is xc2xd table lookup per input symbol, the complexity of the second level is xc2xc table lookup per input symbol, and so on. Each table lookup implements, approximately, the equivalent of an encoder full search unstructured vector quantizer. Table 1 encodes each possible 2-dimensional vector (or rather, each of 64K possible pairs of bytes) to the index of the best or lowest distortion codeword in a 2-dimensional codebook of the size 256. Table 2 encodes each possible 4-dimensional vector (or rather, each of the 64K possible 4-dimensional vectors reproducible after the first level of quantization) to the index of the best or lowest distortion codeword in a 4-dimensional codebook of the size 256, and so on for each level.
A signal flow diagram for such an HVQ encoder is shown in FIG. 1b. In the HVQ of FIG. 1b, the tables T at each stage of the encoder along with the delays Z are illustrated. Each level in the hierarchy doubles the vector dimension of the quantizer, and therefore reduces the bit rate by a factor of 2. By similar reasoning, the ith level in the hierarchy performs one lookup per 2i samples, and therefore the total number of lookups per sample is at most xc2xd+xc2xc+xe2x85x9+ . . . =1, regardless of the number of levels. Of course, it is possible to vary these calculations by adjusting the dimensions of the various tables.
The contents of the HVQ tables can be determined in a variety of ways. A straightforward way is the following. With reference to FIG. 1a, Table 1 is simple a table-lookup version of an optimal 2-dimensional VQ. That is, an optimal 2-dimensional full search vector quantizer with M=256 codewords is designed by standard means (e.g., the generalized Lloyd algorithm discussed by A. Gersho and R. M. Gray, xe2x80x9cVector Quantization and Signal Compression,xe2x80x9d Kluwer Academic Publishers, 1992), and Table 1 is filled so that it assigns to each of its 216 possible 2-dimensional input vectors the 8-bit index of the nearest codeword.
Table 2 is just slightly more complicated. First, an optimal 4-dimensional full search VQ with M=256 codewords is designed by standard means. Then Table 2 is filled so that it assigns to each of its 216 possible 4-dimensional input vectors (i.e., the cross product of all possible 2-dimensional output vectors from the first stage) the 8-bit index of its nearest codeword. The tables for stages 3 and up are designed similarly. Note that the distortion measure is completely arbitrary.
This same structure may be used for vector quantization of the coefficients in a block transform (such as a discrete cosine transform, DCT). In this case, the transform computation may be embodied in the table lookups, as studied by N. Chaddha, M. Vishwanath, and P. A. Chou, xe2x80x9cHierarchical Vector Quantization of Perceptually Weighted Block Transforms,xe2x80x9d Proc. Data Compression Conference, Snowbird, Vt., April 1995. A sliding window version of the structure may be used for vector quantization of the coefficients of wavelet, subband, or lapped transforms, as studied by N. Vishwananth and P. A. Chou, xe2x80x9cAn Efficient Algorithm for Hierarchical Compression of Video,xe2x80x9d Proc. Int""l. Conf. on Image Processing, Austin, Tex., November 1994, Vol. 3, pp. 275-279. In this case as well, the transform computation is embodied in the tables. It is also a simple matter to embody arbitrary perceptual distortion measures in the tables.
Thus, hierarchical vector quantization offers both extremely low computational complexity and good rate-distortion performance. It also offers a simple means of transcoding (reformatting) the bit stream at rate R bits per second to another bit stream at rate R/2 (or R/4, R/8, etc.) bits per second by further table-lookups to map each pair of bytes into a single byte. In M. Vishwananth and P. A. Chou, xe2x80x9cAn Efficient Algorithm for Hierarchical Compression of Video,xe2x80x9d Proc. Int""l. Conf. on Image Processing, Austin, Tex., November 1994, Vol. 3, pp. 275-279, it was argued that such simple transcoding would be useful for reducing the bit rate of the compressed signal at gateways between high and low capacity networks.
Unfortunately, transcoding by table lookup is not simple enough for certain applications. It is undesirable to place application-dependent algorithms (such as table look-ups for video coding) at network gateways or switches. Embedded coding is a simpler way, not dependent on the application, for the gateway to transcode from high rate to low rate streams. Equivalently, receivers on the low rate network can subscribe to only the high priority streams. This is called bandwidth scalability. If an embedded bit stream is packaged into prioritized bit streams, then the gateway need only pass on to the low rate network the high priority streams. No application-specific processing is done at the gateway. A gateway node need only be able to threshold packet numbers, which is a capability that will most likely be present in future network protocols.
Another use of embedded coding, or prioritized bit streams, in a packet network is congestion control. As packet buffers overflow, low priority packets can be dropped, and the signal reconstructed from the remaining packets will be gracefully degraded. A similar idea applies to wireless communications. If packets of information are prioritized, then it is possible to use unequal error protection on the different packets. That is, the highest priority packets will be channel coded for maximum error protection, and the lowest priority packets may not have any error correction applied, before modulation and transmission. The encoder need not have any particular knowledge of packet loss or channel characteristics. It need only rank bits in terms of priority.
A final use of embedded coding, particularly for fixed-rate networks, is for rate control, as mentioned above. Many of these uses of embedded coding are simultaneously applicable.
The present invention contemplates a new and improved image data transmission method which resolves the shortcomings of prior schemes.
A method for encoding and decoding an image is provided. The method, which is implemented in a complementary apparatus, comprises receiving an image and performing an encoding process involving hierarchical table lookup vector quantization on blocks of the image and embedding resultant data to obtain an embedded bit stream of data representing the image. Any selected prefix of the bit stream represents a valid bit stream of the image at a lower rate than the entire bit stream.
In accordance with another aspect of the present invention, the embedded bit stream is transmitted to a receiver.
In accordance with another aspect of the present invention, the transmitted bit stream may be truncated during transmission.
In accordance with another aspect of the invention, a decoding operation is performed on the truncated embedded bit stream to obtain a reconstructed image.
In accordance with another aspect of the invention, decoding operations are successively performed on the embedded bit stream to obtain progressively improved reconstructed images as the bits arrive at the decoder.
One advantage of the present invention is that an improved method of transcoding from high rate to low rate bit streams is provided.
Another advantage of the present invention is that an entire vector representing an image can be losslessly encoded such that every prefix of the string is a valid code for the entire vector so that a receiver can use as much of the embedded bit stream as is necessary to reconstruct the image to a desired or allowable resolution.
Another advantage of the present invention is that this lossless encoding/decoding is accomplished using only table lookups.
Further scope of the applicability of the present invention will become apparent from the detailed description provided below. It should be understood, however, that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art.