1. ISO/IEC 11172-2, Coding of moving pictures and associated audio for digital storage media at up to 1.5 Mbps. Technical Report, MPEG (Moving Pictures Expert Group), International Organization for Standardization, 1993.
2. Dldler Le Gall, xe2x80x9cMPEG: a video compression standard for multimedia applications,xe2x80x9d Communications of the ACM, Vol. 34, No. 4, pp. 47-63, April, 1991.
3. ISO/IEC 13818-2, Generic coding of moving pictures and associated audio information. Technical Report, MPEG (Moving Pictures Expert Group), International Organization for Standardization, 1994.
4. ISO/IECJTC1/SC29/WG11 N2725, Overview of the MPEG-4 standard, 1999.
5. I. Daubechies and W. Sweldens, xe2x80x9cFactoring wavelet transforms into lifting steps,xe2x80x9d J. Fouries Anal. Appl., 4(no.3), pp. 247-269, 1998.
6. M. Tsai, J. Villasenor, and F. Chen, xe2x80x9cStark-run image coding,xe2x80x9d IEEE Trans. on Circuits and Systems for Video Technology, 6(5): 519-521, 1996.
The present invention relates to the field of video compression. More particularly, the present invention relates to a method for video compression using dynamic 3D wavelet transform.
Video is a data-rich medium that results in the creation of large files in a computer system. Video data requires a large memory space to store in the computer or an extremely wide bandwidth to transmit on the Internet. Video compression is a method for reducing the amount of data that represents the video such that one can enjoy the video even with a small memory space plus a narrow bandwidth. One parameter to measure the performance of a compression scheme is the xe2x80x9ccompression ratioxe2x80x9d which reflects the ratio of the size of the original video to the size of the compressed file.
Many video compression approaches are based on discrete cosine transform (DCT), which has a number of unique features. The most important one is the block-based implementation that divides the image into blocks of 8 by 8 pixels. DCT is then performed on the blocks individually. Blocks also support a motion compensation scheme by identifying the motions of blocks between two frames, which further increase the compression ratio. The DCT based approach has generated a number of industrial standards including MPEG-1, MPEG-2, and MPEG-4. MPEG-1 ([1]) is a derivative of the H.261 specification. It can compress video signal to the rate of 1-1.5 Mbps with reasonably good quality ([2]). MPEG-2 ([3]) retains the coding technique of MPEG-1 but is a considerable improvement over MPEG-1 by supporting larger frames. The newly announced MPEG-4 further improves MPEG-2 by introducing content-based compression ([4]).
The block-based DCT approach has a major disadvantage. It generates the block artifacts, especially at a high compression ratio. The artifacts significantly reduce the quality of the video and are unpleasant to the eyes. Wavelet transform is a new approach for image and video compression emerging in recent years. It has been proven superior to the DCT approach for the following reasons:
a. The wavelet transform is applied to the entire image; it thus avoids the block artifacts.
b. The wavelet transform localizes signal characteristics in both the spatial and temporal domains and can most efficiently explore the spatial redundancy to achieve the highest compression ratio.
c. The wavelet transform decomposes an image into a low resolution version of the image along with a series of enhancements that add fine details. Thus, the wavelet transform can support continuous rate scalibility.
A number of compression schemes based on wavelet transform have been developed including U.S. Pat. Nos. 5,315,670, 5,321,776, and 5,412,741 for a so-called zero-tree structure plus 5,757,974, 6,031,937, 6,091,777, and 6,101,284. All of these compression schemes have ignored an important aspect of compression. They do not fully exploit the redundancy in the temporal domain. In these approaches, either no wavelet transform is considered in the temporal domain or only a random number of frames are included in the temporal domain (the third dimension) for the wavelet transform.
In the MPEG approach, the redundancy in the temporal domain is reduced by examining the similarity among eight consecutive frames (so-called I, B, and P frames). The compression ratio is limited to 8:1 even if all eight frames are identical. There may be hundreds of consecutive frames which are similar in a video sequence. If these frames are included in the same group for the 3D wavelet transform, the compression ratio can be significantly increased. On the other hand, if grouping is random, the 3D wavelet transform does not produce the highest compression ratio.
The solution for the problems described above is the Dynamic 3D Wavelet Transform scheme of the present invention, which includes only those consecutive frames that are similar in a single group. Because the number of frames may be different between groups, it is called dynamic 3D wavelet transform.
The present invention provides a method which takes full advantage of the wavelet transform in the temporal domain. The method first applies the wavelet transform in the spatial domain (x and y directions) called 2D wavelet transform. The 2D wavelet transform generates four frequency bands, low-low (LL), low-high (LH), high-low (HL), and high-high (HH) bands. The 2D wavelet transform can be applied to the LL band again to produce additional four bands within the LL band. This process can continue until the generated LL band is sufficiently small.
The first LL band of a current frame is compared with the LL band of the next frame to determine a difference between the two. If the difference is less than a threshold, the next frame is considered similar to the current frame. This process continues until the next frame is dissimilar. All similar frames are put together in the original order and form a frame group. In the temporal domain, the wavelet transform is applied to the pixels of the frames that have identical x and y positions. Because the frames are similar, the intensity of the pixels at the same position may have no or little change, and the number of pixels which do change is small. As a result, the wavelet transform in the temporal domain generates the maximum compression ratio.
It is possible that the number of similar frames is very large. The memory space is not large enough to hold all the frames. This problem is solved by using a 2D-3D hybrid approach. That is, only the LL band generated by the first wavelet transform is involved in the 3D wavelet transform. The coefficients of the LH, HL, and HH bands are quantized and encoded. The encoded coefficients are either transmitted to a receiver or stored in the disk without further wavelet transform. This approach reduces the memory space and computation time for performing the 3D wavelet transform. Because the coefficients in the LH, HL, and HH are smaller than those in the LL band and many of them are eliminated after the quantization, high compression ratio can be achieved in those three bands even without the 3D wavelet transform. If the original frame is small, the 3D wavelet transform can still be applied to all four bands.
In order to control the compression ratio in the spatial and temporal domains separately, the wavelet coefficients generated in the spatial domain is quantized first before the wavelet transform in the t (time) direction is applied. Quantization is then applied to the wavelet coefficients generated in the third and final wavelet transform. In this way, the quantization step can be performed twice, in the spatial and temporal domains, respectively. One may choose not to separate the two quantization steps, but instead perform one step to the final result of the 3D wavelet transform.