This invention relates to systems and methods for video coding. More particularly, this invention relates to systems and methods that employ wavelet transforms for video coding.
Efficient and reliable delivery of video data is becoming increasingly important as the Internet continues to grow in popularity. Video is very appealing because it offers a much richer user experience than static images and text. It is more interesting, for example, to watch a video clip of a winning touchdown or a Presidential speech than it is to read about the event in stark print.
Unfortunately, video data is significantly larger than other data types commonly delivered over the Internet. As an example, one second of uncompressed video data may consume one or more Megabytes of data. Delivering such large amounts of data over error-prone networks, such as the Internet and wireless networks, presents difficult challenges in terms of both efficiency and reliability.
To promote efficient delivery, video data is typically encoded prior to delivery to reduce the amount of data actually being transferred over the network. Image quality is lost as a result of the compression, but such loss is generally tolerated as necessary to achieve acceptable transfer speeds. In some cases, the loss of quality may not even be detectable to the viewer.
Video compression is well known. One common type of video compression is a motion-compensation-based video coding scheme, which is used in such coding standards as MPEG-1, MPEG-2, MPEG-4, H.261, and H.263. Such video compression schemes use predictive approaches that encode information to enable motion prediction from one video frame to the next.
An alternative to predictive-based video coding schemes is three dimensional (3-D) wavelet video coding. One advantage of 3-D wavelet coding over predictive video coding schemes is scalability (including rate, PSNR, spatial, and temporal), which facilitates video delivery over heterogeneous networks (e.g., the Internet) and future wireless video services. Existing encoders may use 3-D wavelet coding to seamlessly adapt to different channel conditions, such as bandwidth fluctuation and packet errors/losses, while existing decoders can adapt to different computational resources.
In a typical 3-D wavelet video coder, a two-dimensional (2-D) spatial transform and a one-dimensional (1 -D) temporal transform are performed separately. Usually, spatial decomposition is applied after temporal decomposition.
FIG. 1 illustrates a 3-D wavelet coding process on a video sequence 100 consisting of multiple 2-D matrices or frames of pixel data 102. The coding process typically segments the sequence into multiple groups of pictures (GOP), as represented by four-frame GOP 104. A first level temporal decomposition is applied to each GOP in the video sequence to produce sequence 110. In this example, a 2:1 compression ratio is used as indicted by shading every other frame. Subsequently, a second level temporal decomposition is applied to each GOP in the video sequence to produce sequence 120. In this example, a 4:1 compression ratio is used in the second level temporal decomposition, as indicated by every fourth frame being shaded.
A spatial decomposition is then performed on the sequence 100 to produce sequence 130. Spatial decomposition is applied with each frame independently. Here, every fourth frame is spatially decomposed.
One drawback of current 3-D wavelet coders is that frame quality or PSNR drops severely at the boundaries between each group of pictures (GOP), sometimes up to several decibels. This results in jittering artifacts in video playback, which can be very annoying to a viewer.
FIG. 2 illustrates the boundary effect in which the resulting image quality fluctuates at boundaries between consecutive GOPs. The lower graph 200 shows consecutive GOPs 1, 2, 3, 4, etc. Each GOP contains five frames. The upper graph 202 shows the visual quality fluctuation within one GOP, such as GOP 2. Notice that the quality is significantly worse at the first and last frame of each GOP, causing the jittering artifacts in video playback.
One explanation for this boundary disorder is that conventional wavelet coding schemes improve as the number of frames in each GOP increases. Many schemes assume an infinitely long GOP containing a sequence of infinitely many frames. Unfortunately, GOP length is limited in practice due to delay or memory constraints. Coders and decoders, for example, commonly employ small-size buffers that hold only a few frames at a time. Thus, conventional coding schemes exhibit the boundary effect consistent with the GOP length. If memory was infinitely large, a coder could potentially buffer the whole video sequence and process it as a whole in 3-D wavelet transform and bit-plane coding.
Accordingly, there is a need for a memory efficient 3-D wavelet transform for video coding that reduces or effectively eliminates the boundary effect.
A video coding system and method utilizes a 3-D wavelet transform that is memory efficient and reduces the boundary effect. The wavelet transform employs a lifting scheme to decompose video frames into wavelet coefficients. The system buffers partially-processed wavelet coefficients at intermediate lifting steps for the last part of one GOP until intermediate coefficients from the beginning of the next GOP are available.
The wavelet transform scheme does not physically break the video sequence into GOPs, but processes the sequence without intermission. As a result, the system simulates an infinite wavelet transformation across GOP boundaries, as if the system were employing infinite memory. The boundary effect is therefore significantly reduced or essentially eliminated. Moreover, the buffering is very small and the scheme can be used to implement other decomposition structures.
A decoding system that employs an inverse 3-D wavelet transform is also disclosed. The wavelet transform scheme provides superb video playback quality with little or no boundary effects.