The present invention relates generally to compression and decompression of data. More specifically, the present invention relates to a motion wavelet transform zero tree codec for compression of images and video.
A number of important applications in image processing require a very low cost, fast and good quality video codec (coder/decoder) implementation that achieves a good compression ratio. In particular, a low cost and fast implementation is desirable for low bit rate video applications such as video cassette recorders (VCRs), cable television, cameras, set-top boxes and other consumer devices. In particular, it is often desirable for such a codec to be implemented on a low-cost, relatively small, single integrated circuit.
Recent notable advances in the field of compression have been made by a variety of institutions. For example, RICOH Corporation has recently introduced the continuous-tone still image compression technology called xe2x80x9cCompression With Reversible Embedded Waveletsxe2x80x9d (CREW). CREW is a unified lossless and lossy continuous-tone still image compression system based on a lossless (reversible) wavelet transform and has embedded quantization. The CREW technology uses TS-transform filters to perform the wavelet transform. These TS-transform filters are based upon the well known 2-6 Biorthogonal filters. For encoding after the transform, the CREW technology uses one of three possible adaptive binary entropy coders: the Finite State Machine (FSM) Coder, the QM-Coder, and the high-speed parallel coder.
The use of reversible wavelet transforms, embedded code stream and a high-speed high-compression binary entropy coder combine to make the CREW technology ideal for a number of high end image compression applications. These applications include medical imagery, pre-press images, continuous tone facsimile documents, image archival, world wide web images and satellite imagery. Many of these applications have not used compression in the past, either because the quality could not assured, the compression rate was not high enough, or the data rate was not controllable. Thus, the CREW technology provides very high quality, high compression rate data compression for these high end image compression applications. However, it is not clear that the CREW technology is especially suited for extremely low cost, good quality, real-time video codec implementations that only need a relatively good compression ratio. In particular, it is not clear that the CREW technology is especially suitable for low cost and fast implementations for low bit-rate applications in consumer devices.
There are hardware devices implemented on a single chip that do provide real time compression and decompression of video images. For example, the ADV601 device available from Analog Devices, Inc., of Norwood, Mass., is a low cost, single chip CMOS VLSI device for real time compression and decompression of interlaced digital video. To perform a wavelet transform on incoming data, the ADV601 is based on the 7-9 Biorthogonal wavelet transform (Daubechies wavelets). For encoding, the ADV601 uses well-known Huffman coding.
Unfortunately, the ADV601 can be slow to process data and can be expensive to implement in both hardware and software due to the multiplications required. For example, the Daubechies wavelets require, on the average, six multiplications per wavelet coefficient for both encoding and decoding. For a typical image of size 640xc3x97480 pixels, there are 307,200 wavelet coefficients i.e., about 1.85 million multiplications per image. For video this number can be even higher. For example, at 30 frames per second 55 million multiplications per second must be performed for both encoding and decoding. The sheer number of these multiplications means the processing is slower, especially for real time video images, and means that a hardware implementation is much more expensive.
Continuing with a discussion of compression in general, it is noted that transform-based compression of data typically involves the steps of transformation, quantization and encoding. Bit encoding may be performed using a wide variety of techniques.
A bit encoding technique known as the zerotree algorithm has been used in the past to encode classical wavelets. The zerotree algorithm for encoding wavelet coefficients was first introduced by J. M. Shapiro in xe2x80x9cEmbedded Image Coding Using Zerotrees of Wavelet Coefficients,xe2x80x9d 41(12):34453462, IEEE Trans. Signal Process, 1993. The technique has been extended recently by A. Said and W. A. Pearlman in the manuscript xe2x80x9cA New Fast and Efficient Image Codec Based on Set Partitioning in Hierarchical Treesxe2x80x9d, submitted to the IEEE Transactions on Circuits and Systems for Video Technology. A further explanation of this technique can also be found in xe2x80x9cImage Compression Using the Spatial-Orientation Treexe2x80x9d, A. Said and W. Pearlman, IEEE Report 0-7803-1254-6/93, 1993. Each of these publications is incorporated herein by reference. The encoding technique of Said-Pearlman relies on the fact that in smooth regions of images the coefficients decay exponentially. This implies that if a certain coefficient is below a threshold then its children are very likely to be below the threshold as well. Thus, a whole subtree of small coefficients below the sub-threshold coefficient may be discarded.
The Said-Pearlman zerotree algorithm uses lists and sets to facilitate its implementation. There is a list of insignificant sets (LIS), a list of insignificant nodes (LIN), and a list of significant nodes (LSN). A set O represents all children of a node, a set D represents all descendants of a node, and a set L represents all grandchildren of a node and below. Initially, the lists LIS and LIN are initialized to contain top level nodes. The list LSN is initialized to the empty list. The Said-Pearlman zerotree algorithm relies heavily upon list processing and the shifting of nodes back and forth between lists and sets. Although this technique of list processing and shuffling can be implemented for fast execution in software, it is not a desirable solution for hardware implementation. For one, such an implementation uses a great deal of memory and hardware. Secondly, the complex accessing pattern relied upon by this list processing requires not only more memory and associated hardware, but the algorithm is slower when implemented in hardware. Thus, use of the Said-Pearlman zerotree algorithm is not particularly desirable for a hardware implementation of a compression device.
For example, it is not apparent that prior art techniques have made use of a zerotree in the implementation of a video codec implemented in hardware. For example, neither the CREW technology nor the ADV601 device make use of zero trees. Use of a modified zerotree in combination with second generation wavelets is discussed in the commonly assigned, pending U.S. patent application Ser. No. 08/607,388 filed Feb. 27, 1996, by inventors Kolarov et al., entitled xe2x80x9cWavelet-Based Data Compressionxe2x80x9d which provides an efficient technique for compression and decompression of functions defined upon three-dimensional surfaces.
Therefore, a compression technique for video and image compression is desirable which may be implemented in hardware of modest size and very low cost. It would be further desirable for such a compression technique to take advantage of the benefits provided by zerotree encoding.
To achieve the forgoing, and in accordance with the purpose of the present invention, a motion wavelet transform zero tree codec is disclosed that achieves high compression ratios and may be implemented in hardware of modest size and at very low cost. In particular, embodiments of the present invention are well-suited for fast, real-time video compression.
One aspect of the present invention combines a wavelet transform with a novel tree walk technique for encoding the resulting wavelet coefficients, thus providing a very low cost, fast and good quality video codec implementation. In one particular embodiment, the present invention uses the 2-6 wavelet transform to provide a cheaper implementation in hardware. In a further embodiment, wavelet coefficients from the transform are represented in an array of zero trees which are traversed to produce an output of encoded bits.
The present invention outperforms the ADV601 video encoder/decoder device. Performance is improved by using shorter and less computationally intensive filters than those used in the ADV601. For example, while the ADV601 uses the 7-9 wavelet transform which requires multiplications, an embodiment of the present invention using 2-6 wavelet transforms needs no multiplications in the transform implementation. Such an implementation means fewer additions, and thus the overall method is significantly cheaper for software and in particular for a hardware implementation. Furthermore, an embodiment of the present invention traverses a zero tree in a novel fashion, allowing encoded bits to be output directly during the tree walk and avoiding complex and time consuming list processing and shuffling. Unlike the traditional zero tree algorithm which uses shuffling of nodes and coefficients between sets and lists, the present invention performs a direct tree walk of the zero trees produced which means a more efficient and cheaper hardware implementation. The present invention is especially suited for implementation upon a single integrated circuit.
In addition, the present invention produces better peak signal to noise ratios (PSNR) for a variety of different images and video. Experimentation with images from the test suite for the ADV601, as well as with the video sequences used for evaluation of the MPEG4 proposals, reveal that the present invention outperforms the ADV601 significantly both in PSNR and perceptually.
An aspect of the present invention is able to transform fields of pixels independently which greatly reduces the complexity of the compression and reduces the amount of RAM needed. In a specific implementation of this embodiment, a two-degree quadratic approximation is drawn through edge points on a field and is assumed to continue across field boundaries. An improved 2-6 Biorthogonal filter is used to filter information in successive passes by providing specific numerical values for the initial and final lifted differences (w0 and wnxe2x88x921) rather than simply assigning zero values for their coefficients as is done in the prior art. Assigning specific numerical values for the lifted difference values at the field boundaries allows each field to be treated independently yet still reduces blocking artifacts that would normally occur when an image is decompressed.
The present invention is useful with a variety of types of images, such as those intended for computer monitors, televisions, cameras, hand-held devices etc., and is applicable to a wide variety of standards such as NTSC video, PAL and SECAM television etc.