The present invention relates in general to data compression techniques. More specifically, the present invention relates to a compressed data stream generated in accordance with data compression technique using hierarchical subband decomposition of a data set and set partitioning of data points within the hierarchical subband decomposition using hierarchical trees. Moreover, the present invention relates to a data structure facilitating decoding and encoding of a subband decomposition of data points and compressed data containing that data structure. In particular, the present invention relates to N-dimensional data compression and recovery using set partitioning in hierarchical trees.
As the amount of information processed electronically increases, the requirement for o information storage and transmission increases as well. Certain categories of digitally processed information involve large amounts of data, which translates into large memory requirements for storage and large bandwidth requirements for transmission. Accordingly, such storage and/or transmission can become expensive in terms of system resource utilization, which directly translates into economic expense. It will be appreciated that the digitally processed information can be one dimensional (1-D) information, e.g., audio data, two dimensional (2-D) information, e.g., image data, or three dimensional (3-D) information, e.g., video data. These examples are illustrative, rather than limiting.
With respect to 2-D data, many data compression techniques have been employed to decrease the amount of data required to represent certain digitized information. For example, compression techniques have been applied to the data associated with a bit-mapped image. One prior data compression technique devoted to image data is the ISO/JPEG (International Standards Organization/Joint Photographic Experts Group) data compression standard. Although the ISO/JPEG technique has been adopted as an industry standard, its performance is not optimal.
Recently, techniques using hierarchical subband decomposition, also known as wavelet transforms, have emerged. These techniques achieve a hierarchical multi-scale representation of a source image. For example, subband decomposition of video""signals, i.e., 3-D information, is disclosed in U.S. Pat. No. 5,223,926 to Stone et al. and U.S. Pat. No. 5,231,487 to Hurley et al., each of which is incorporated herein by reference in its entirety. However, once subband decomposition of a source image has been performed, the succeeding techniques of coding the resultant data for transmission and/or storage have yet to be fully optimized. Specifically, for example, both the computational efficiency and coding efficiency of the prior techniques may be further improved. One prior technique has been disclosed by A. Said and W. Pearlman in xe2x80x9cImage Compression Using the Spatial-Orientation Tree.xe2x80x9d IEEE Int. Symp. on Circuits and Systems, Vol. 1, pp.279-282, May 1993, which is also incorporated herein by reference in its entirety.
With respect to 3-D data, the demand for video for transmission and delivery across both high and low bandwidth channels has accelerated. The high bandwidth applications include digital video by satellite (DVS) and high-definition television (HDTV), both based on MPEG-2 compression technology. The low bandwidth applications are dominated by transmission over the Internet, where most modems transmit at speeds below 64 kilobits per second (Kbps). Under these stringent conditions, delivering compressed video at an acceptable quality level becomes a challenging task, since the required compression ratios are quite high. Nonetheless, the current test model standards of H.263 and H.263+ do a creditable job in providing video of acceptable quality for certain applications at high bit rates sought by ISO""s MPEG-4, which also seeks low bit rates, and ITU""s H.26L standards groups, but better schemes with increased functionality are actively being sought by the MPEG-4 and MPEG-7 standards committees.
The current and developing standards of MPEG-2, H.263, H.263+, MPEG-4, and H.26L are all based on block DCT coding of displaced frame differences, where displacements or motion vectors are determined through block-matching estimation methods. Although reasonably effective, these standards lack the inherent functionality now regarded as essential for emerging multimedia applications. In particular, resolution and fidelity (rate) scalability, the capability of progressive transmission by increasing resolution and increasing fidelity, is considered essential for emerging video applications to multimedia. Moreover, if a system is truly progressive by rate or fidelity, then it can presumably handle both the high-rate and low-rate regimes of digital satellite and Internet video, respectively. The current and emerging standards use a hybrid motion-compensated differential discrete cosine transform (DCT) coding loop, which must use a base layer of reasonable fidelity and add layers of increasing fidelity upon it to achieve progressive fidelity. By its very nature, this kind of scheme allows no scalability or progressivity of the base layer and must suffer in accuracy compared to single layer coding at the same bit rate.
Subband coding has been shown to be a very effective coding technique. It can be extended naturally to video sequences due to its simplicity and non-recursive structure that limits error propagation within a certain group of frames (GOF). Three-dimensional (3-D) subband coding schemes have been designed and applied for mainly high or medium bit-rate video coding. Karlsson and Vetterli in their article entitled Three Dimensional Subband Coding of Video (Proc. ICASSP, pages 1100-1103, April 1988.), took the first step toward 3-D subband coding using a simple 2-tap Haar filter for temporal filtering. Podilchuk, Jayant, and Farvardin in the article Three-Dimensional Subband Coding of Video (IEEE Transactions on Image Processing, 4(2):125-139, February 1995), described the use of the same 3-D subband coding (SBC) framework without motion compensation. It employed adaptive differential pulse code modulation (DPCM), and vector quantization to overcome the lack of motion compensation.
Furthermore, Kronander, in his article entitled New Results on 3-Dimensional Motion Compensated Subband Coding (Proc. PCS-90, March 1990), presented motion compensated temporal filtering within the 3-D SBC framework. However, due to the existence of pixels not encountered by the motion trajectory, he needed to encode a residual signal. Based on the previous work, motion compensated 3-D SBC with lattice vector quantization was introduced by Ohm in his article entitled Advanced Packet Video Coding Based on Layered VQ and SBC Techniques (IEEE Transactions on Circuit and System for Video Technology, 3(3):208-221, June 1993). Ohm introduced the idea for a perfect reconstruction filter with the block-matching algorithm, where 16 frames in one GOF are recursively decomposed with 2-tap filters along the motion trajectory. He then refined the idea to better treat the connected/unconnected pixels with arbitrary motion vector field for a perfect reconstruction filter, and extended to arbitrary symmetric (linear phase) QMF""s. See Three-Dimensional Subband Coding with Motion Compensation (IEEE Transactions on Image Processing, 3(5):559-571, September 1994). Similar work by Choi and Woods, described in their article Motion-Compensated 3-D Subband Coding of Video (Submitted to IEEE Transactions on Image Processing, 1997), employed a different way of treating the connected/unconnected pixels; this sophisticated hierarchical variable size block matching algorithm has shown better performance than MPEG-2.
Due to the multiresolutional nature of SBC schemes, several scalable 3-D SBC schemes have appeared. Bove and Lippman, in their article entitled Scalable Open-Architecture Television (SMPTE J., pages 2-5, January 1992) proposed multiresolutional video coding with a 3-D subband structure. Taubman and Zakhor introduced a multi-rate video coding system using global motion compensation for camera panning, in which the video sequence was pre-distorted by translating consecutive frames before temporal filtering with 2-tap Haar filters. See D. Taubman, Directionality and Scalability in Image and Video Compression (PhD thesis, University of California, Berkeley, 1994) and D. Taubman et al., Multirate 3-D Subband Coding of Video (IEEE Transactions on Image Processing, 3(5):572-588, September 1994). This approach can be considered as a simplified version of Ohm""s technique in that it treats connected/unconnected pixels in a similar way for temporal filtering. However, the algorithm generates a scalable bit-stream in terms of bit-rate, spatial resolution, and frame rate.
Meanwhile, there have been several research activities on embedded video coding systems based on significance tree quantization, which was introduced by Shapiro for still image coding as the embedded zerotree wavelet (EZW) coder in the paper entitled An Embedded Wavelet Hierarchical Image Coder (Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Francisco, pages IV 657-660, March 1992). It was later improved through a more efficient state description in the paper by A. Said et al. entitled Image Compression Using the Spatial-Orientation Tree (Proc. IEEE Intl. Symp. Circuits and Systems, pages 279-282, May 1993) and called improved EZW or IEZW. This two-dimensional (2-D) embedded zero-tree (IEZW) method has been extended to 3-D IEZW for video coding by Chen and Pearlman, as described in the paper entitled Three-Dimensional Subband Coding of Video Using the Zero-Tree Method (Visual Communications and Image Processing ""96, Proc. SPIE 2727, pages 1302-1309, March 1996) and showed promise of an effective and computationally simple video coding system without motion compensation, and obtained excellent numerical and visual results. A 3-D zero-tree coding through modified EZW has also been used with good results in compression of volumetric images, as reported by in the paper by J. Luo et al. entitled Volumetric Medical Image Compression with Three-Dimensional Wavelet Transform and Octave Zerotree Coding (Visual Communications and Image Processing""96, Proc. SPIE 2727, pages 579-590, March 1996). Recently, a highly scalable embedded 3-D SBC system with tri-zerotrees for very low bit-rate environment was reported with coding results visually comparable, but numerically slightly inferior to H.263. See J. Y. Tham et al., Highly Scalable Wavelet-Based Video Codec for Very Low Bit-rate Environment (IEEE Journal on Selected Area in Communications, Vol. 16, pp. 4-27 (January 1998)).
The present invention is directed toward optimizing the coding of a subband decomposition of N-dimensional data for transmission and/or storage. What is needed is a N-dimensional subband coder and corresponding decoder that is both fast and efficient. Moreover, what is needed is a three-dimensional (3-D) subband-based image sequence coder that is fast and efficient. It would be highly desirable to have a 3-D subband-based image sequence coder that possesses the multimedia functionality of resolution and rate scalability.
Based on the above and foregoing, it can be appreciated that there presently exists a need in the art for coders and corresponding decoders that overcome the above-described deficiencies. The present invention was motivated by a desire to overcome the drawbacks and shortcomings of the presently available technology, and thereby fulfill this need in the art.
One object of the present invention is to provide a more efficient 3-D subband embedded coding system capable of coding image sequences, including video and volume imagery.
Another object of the present invention is to provide a computationally simple 3-D subband embedded image sequence coding system. According to one aspect of the invention, the 3-D subband embedded image sequence coding system has many desirable attributes including:
a. complete embeddedness for progressive fidelity transmission;
b. precise rate control for constant bit-rate (CBR) traffic;
c. low-complexity for possible software-only real time implementation and applications; and
d. multiresolution scalability.
Another object according to the present invention is to produce a 3-D subband coding system that is compact. Advantageously, the 3-D subband coding system, in an exemplary case, is so compact that it consists of only two parts: a 3-D spatio-temporal decomposition device; and a 3-D SPIHT coding device. According to one aspect of the present invention, an input image sequence, e.g., video, is first 3-D wavelet transformed with (or without) motion compensation (MC), and then encoded into an embedded bit-stream by the 3-D SPIHT kernel.
Briefly summarized, in a first aspect, the present invention includes a method for use in encoding and decoding a subband decomposition of an N-dimensional data set, where N is a positive integer. The method comprises creating a list of insignificant sets of points (referred to herein as the list of insignificant setsxe2x80x94xe2x80x9cLISxe2x80x9d), wherein each set of the LIS is designated by a root node within the subband decomposition and has a corresponding tree structure of points within the subband decomposition. The tree structure is organized as points comprising descendants and offspring of the root node, wherein a first generation of the descendants comprises the offspring.
The method further includes evaluating the descendants of the root node of each set of the LIS for significance, wherein a significant descendent of the descendants of the root node has a subband coefficient at least equal to a predetermined threshold. For each root node of the LIS having at least one significant descendant, descendants of the offspring of the root node are evaluated for significance, wherein a significant descendant of the offspring of the root node has a coefficient at least equal to the predetermined threshold. If the root node has at least one significant descendant of offspring, then each offspring of the root node is added to the LIS as a root node thereof.
In an exemplary embodiment, the method includes creating a list of significant pixels (xe2x80x9cLSPxe2x80x9d), the LSP initially comprising an empty set, and creating a list of insignificant pixels (xe2x80x9cLIPxe2x80x9d), the LIP comprising points from within a highest designated subband, i.e., lowest frequency subband, of the subband decomposition. Furthermore, for each root node of the LIS having at least one significant descendant, the offspring of the root node may be evaluated for significance, wherein a significant offspring has a coefficient at least equal to the predetermined threshold. A significance value is input or output for each offspring of the root node, wherein the significance value indicates whether the offspring is significant.
Moreover, the method may include, for each significant offspring of the root node, adding the significant offspring to the LSP and outputting or inputting a sign of the coefficient of the significant offspring. For each insignificant offspring (an insignificant offspring of the root node has the coefficient less than the predetermined threshold), the method may include adding the insignificant offspring to the LIP. When all offspring are insignificant, with at least one significant descendant, a single zero significance value can be output with the root node on LIS, designating an entry of different type.
In another aspect, the present invention includes a data structure in a computer memory for use in encoding and decoding a subband decomposition of data points. The data structure comprises a list of insignificant sets of points (xe2x80x9cLISxe2x80x9d), a list of significant points (xe2x80x9cLSPxe2x80x9d) and a list of insignificant points (xe2x80x9cLIPxe2x80x9d).
As an enhancement, for each set of the LIS, the data structure may include a root node and a set type identifier. The set type identifier defines generations of descendants associated with the root node within the set of the LIS, wherein a first generation of descendants comprises offspring of the root node. Moreover, the set type identifier may comprise one of a first type identifier and a second type identifier. A first type identifier designates that the set comprises all of the descendants of the root node. A second type identifier designates that the set comprises the descendants of the root node excluding the offspring of the root node.
Yet another aspect of the present invention includes a computer program product comprising a computer useable medium having computer readable program code means therein for use in encoding and decoding a subband decomposition of a data set. Computer readable program code means are employed for causing the computer to affect the techniques disclosed herein.
To summarize, the present invention has many advantages and features associated with it. The coding scheme of the present invention used to process a subband decomposition of a data set provides a high level of compression while maintaining a high computational efficiency. The transmitted code (i.e., compressed data set) is completely embedded so that a single file for, e.g., an image at a given code rate, can be truncated at various points and decoded to give a series of reconstructed images at lower rates. Processing may even be run to completion resulting in a near lossless (limited by the wavelet filters) compression. Furthermore, the encoder and decoder use symmetrical techniques such that computational complexity is equivalent during both encoding and decoding. Thus, the techniques of the present invention advance the state of subband decomposition data compression techniques. The coding results are either comparable to, or surpass, previous results obtained through much more sophisticated and computationally complex methods.