Efficient and reliable delivery of video data is becoming increasingly important as the Internet continues to grow in popularity. Video is very appealing because it offers a much richer user experience than static images and text. It is more interesting, for example, to watch a video clip of a winning touchdown or a Presidential speech than it is to read about the event in stark print.
With the explosive growth of the Internet and fast advance in hardware technologies and software developments, many new multimedia applications are emerging rapidly. Although the storage capability of the digital devices and the bandwidth of the networks are increasing rapidly, video compression still plays an essential role in these applications due to the exponential growth of the multimedia contents both for leisure and at work. Compressing video data prior to delivery reduces the amount of data actually being transferred over the network. Image quality is lost as a result of the compression, but such loss is generally tolerated as necessary to achieve acceptable transfer speeds. In some cases, the loss of quality may not even be detectable to the viewer.
Many emerging applications require not only high compression efficiency from the various coding techniques, but also greater functionality and flexibility. For example, in order to facilitate contend-based media processing, retrieval and indexing, and to support user interaction, object-based video coding is desired. To enable video delivery over heterogeneous networks (e.g., the Internet) and wireless channels, error resilience and bit-rate scalability are required. To produce a coded video bitstream that can be used by all types of digital devices, regardless their computational, display and memory capabilities, both resolution scalability and temporal scalability are needed.
One common type of video compression is the motion-compensation-based video coding scheme, which is employed in essentially all compression standards such as MPEG-1, MPEG-2, MPEG-4, H.261, and H.263. Such video compression schemes use predictive approaches that encode information to enable motion prediction from one video frame to the next.
Unfortunately, these conventional motion-compensation-based coding systems, primarily targeted for high compression, fail to provide new functionalities such as scalability and error robustness. The recent MPEG-4 standard adopts an object-based video coding scheme to enable user interaction and content manipulation, but the scalability of MPEG-4 is very limited. Previously reported experiments with MPEG-2, MPEG-4, and H.263 indicate that the coding efficiency generally loses 0.5-1.5 dB with every layer, compared with a monolithic (non-layered) coding scheme. See, for example, B. G. Haskell, A. Puri and A. N. Netravali, Digital Video: An Introduction to MPEG-2, Chapman & Hall, New York, 1997; and L. Yang, F. C. M. Martins, and T. R. Gardos, “Improving H.263+ Scalability Performance for Very Low Bit Rate Applications,” In Proc. Visual Communications and Image Processing, San Jose, Calif., January 1999, SPIE.
Since these standard coders are all based on a predictive structure, it is difficult for the coding schemes to achieve efficient scalability due to the drift problem associated with predictive coding. Currently, there are proposals for MPEG-4 streaming video profile on fine granularity scalable video coding. However, these proposals are limited to provide flexible rate scalability only and the coding efficiency is still much lower than that of non-layered coding schemes.
An alternative to predictive-based video coding schemes is three dimensional (3-D) wavelet video coding. One advantage of 3-D wavelet coding over predictive video coding schemes is the scalability (including rate, PSNR, spatial, and temporal), which facilitates video delivery over heterogeneous networks (e.g., the Internet) and future wireless video services. However, conventional 3-D wavelet coding does not use motion information that is proven to be very effective in predictive coders in terms of removing temporal redundancy. Although the computationally intensive motion estimation is avoided, the performance of 3D wavelet video coding remains very sensitive to the motion. Without motion information, motion blur occurs due to a temporal averaging effect of several frames. In addition, most 3-D wavelet video coders do not support object-based functionality, which is needed in the next generation multimedia applications.
Accordingly, there is a need for an efficient 3-D wavelet transform for video coding that employs motion information to reduce the sensitivity to motion and remove the motion blur in the resulting video playback. Additionally, an improved 3-D wavelet transform should support object-based functionality.