1. Field of the Invention
The present invention generally relates to methods and apparatuses for signal interpolation and extrapolation. More specifically, the present invention relates to temporal filtering for generating improved side information for video coding systems that rely upon Wyner-Ziv principles.
2. Description of the Related Art
Extrapolation and interpolation of a visual signal, such as image, video, and graphics, have been widely used in various contexts, including, but not limited to: video-coding, transcoding, error concealment, pre-processing, and interactive rendering.
For instance, techniques for extrapolating and interpolating in video-coding applications have been described by Aaron et al., in Toward Practical Wyner-Ziv Coding of Video, PROC. IEEE INT. CONF ON IMAGE PROCESSING, pp. 869-872, Barcelona, Spain, Spet. (2003), Puri et al., PRISM: A New Robust Video Coding Architecture based on Distributed Compression Principles, ALLERTON CONFERENCE ON COMMUNICATION, CONTROL AND COMPUTING, (2002), and Yaman et al., in A Low-Complexity Video Encoder with Decoder Motion Estimation, Proc. ICASSP, Montreal, Canada, (2004).
Techniques for extrapolating and interpolating in transcoding applications have been described by U.S. Pat. No. 6,058,143 issued on May 2, 2000 to Golin for “Motion Vector Extrapolation for Transcoding Video Sequences.”
Further, techniques for extrapolating and interpolating in error concealment for video decoding or post-processing applications have been described by Peng et al., in Block-Based Temporal Error Concealment for Video Packet Using Motion Vector Extrapolation, International Conf on Communications, Circuits, Systems and West Sino Expo, pp. 10-14, Jun. 29-Jul. 1, (2002) and by U.S. Pat. No. 6,285,715 issued on Sep. 4, 2001, to Ozcelik for “Methods and Apparatus for Error Concealment While Decoding a Coded Video Bit Stream.”
Conventional visual signal extrapolation and interpolation methods used in video coding, trans-coding, error concealment, video decoding, and post-processing applications are based on motion information and are, therefore, referred to as “motion-based” extrapolation and interpolation methods, respectively.
Conventional non-motion-based extrapolation/interpolation methods are used in other applications, including a model-based view extrapolation method for virtual reality rendering, a feature extrapolation method for pre-compression, and a video fading scene prediction method. For example, a model-based view extrapolation method is described by U.S. Pat. No. 6,375,567 issued on Apr. 23, 2002 to Acres for “Model-Based View Extrapolation for Interactive Virtual Reality Systems.” A feature extrapolation method is described by U.S. Pat. No. 5,949,919 issued on Sep. 7, 1999 to Chen for “Precompression Extrapolation Method.” Likewise a video fading scene prediction is described by Koto et al., in Adaptive Bi-Predictive Video Coding Temporal Extrapolation, ICIP (2003).
One example of a motion-based extrapolation/interpolation method is the side information generation process used in a Wyner-Ziv video coding technique. A typical Wyner-Ziv video coding system includes a video encoder and a video decoder. The video encoder is a low complexity and, therefore, a low power consumption encoder. The computational heavy signal processing tasks, such as motion estimation, are performed by the decoder.
To achieve high coding efficiency, the Wyner-Ziv decoder exploits the statistical correlation between the source and side information, which is only available at the decoder, in decoding the received signals to reconstruct the video. The source is the video signal (e.g., a picture) to be encoded at the encoder and transmitted to the decoder for decoding, and the side information can be viewed as a prediction or essentially an estimate of the decoded picture.
The performance of a Wyner-Ziv video coding system depends heavily on the fidelity and reliability of the side information. The closer the side information is to the source, the better the performance of the system. Therefore, the method and apparatus used by the decoder to generate the side information plays a crucial role in a Wyner-Ziv video coding system.
Typically, the decoder first performs motion estimation on previously reconstructed pictures to generate a set of motion vectors and then uses such motion vectors to generate an estimate of the picture currently being decoded by extrapolation or interpolation. This estimate is used as the side information by the decoder for decoding and reconstructing the current picture.
FIG. 1 is a diagram illustrating a conventional motion-based temporal extrapolation process 100. Specifically, in order to extrapolate a Picture N 106, motion estimation is first performed on at least two previously reconstructed pictures, namely, Pictures N−2 102 and N−1 104, to generate a motion vector 108 for each pixel or block of pixels 110 in Picture N−1 104, which is indicative of the motion of the pixel or the block of pixels between Picture N−1 104 and Picture N−2 102 (i.e., a “reverse” motion). This is done for all pixels or all blocks of pixels to provide a set of motion vectors.
Then, the set of motion vectors are manipulated according to a predetermined function that is based upon an underlying motion model or assumption. For example, if a constant linear displacement motion model is used for the predetermined function, then the motion vectors are reversed, and the pixel or the block of pixels associated with the motion vectors is extrapolated (i.e., mapped) from its location in Picture N−1 104 to a location defined by the reversed motion vectors in an estimate of the extrapolated Picture N 106.
Note that the motion vector 108 may also be constructed for each pixel or a block of pixels in Picture N−2 102 to indicate the motion between Picture N−2 102 and Picture N−1 104. In such a case, the motion vector 108 should then be shifted, and the pixel or the block of pixels associated with the motion vector should be extrapolated or mapped from its location in Picture N−1 104 to a location defined by the scaled motion vector in an estimate of the extrapolated Picture N 106.
The motion-based temporal extrapolation process as described above, therefore, extrapolates the current Picture N 106, after all the pixels or the blocks of pixels 110 in Picture N−1 104 (or Picture N−2 102) are mapped.
FIG. 2 illustrates a conventional motion-based temporal interpolation process 200. Motion estimation is first performed on at least two previously reconstructed pictures, namely, Pictures N−1 202 and N+1 206, to obtain a motion vector 208 for each pixel or a block of pixels 210 in Picture N−1 202, which is indicative of the motion of the pixel or the block of pixels 210 from Picture N−1 202 to Picture N+1 206.
Then, the motion vector 208 is scaled down (e.g., by a factor of 2) based on an underlying assumption of a constant linear displacement motion model, and the pixels or the blocks of pixels 210 associated with the motion vectors 208 are interpolated from their locations in Picture N−1 202 and/or N+1 206 to a location defined by the scaled motion vector in an estimate of the current Picture N 204.
Note that the motion vector 208 can also be constructed for each pixel or a block of pixels 212 in Picture N+1 206 to indicate the motion between Picture N+1 206 and Picture N−1 202 to provide a set of motion vectors. In such an incident, the set of motion vectors should also be scaled down (e.g., by a factor of 2), and the pixels or the blocks of pixels associated with the set of motion vectors should be interpolated from their locations in Picture N−1 202 and/or Picture N+1 206 to a location defined by the scaled set of motion vectors in an estimate of the current Picture N 204.
The motion-based temporal interpolation process as described above interpolates the current Picture N 204, after all the pixels or the blocks of pixels in Picture N+1 206 (or Picture N−1 202) are mapped.
FIG. 3 is a flowchart 30 that describes the operation of a conventional motion-based extrapolation and interpolation system. Specifically, the system 300 includes a motion estimation unit 302 and a linear extrapolation/interpolation unit 304. The motion estimation unit 302 receives picture signals from previously reconstructed pictures and generates a set of motion vectors. For example, referring to FIG. 2, the motion estimation unit 302 receives reference pictures N−1 202 and N+1 206 and determines a motion vector 208 between block 210 and corresponding block 212.
Then, the linear extrapolation/interpolation unit 304 receives the motion vectors and the reference pictures to generate an estimate of the picture in accordance with an underlying motion model. For example, referring to FIG. 2, the linear extrapolation/interpolation unit 304 receives the reference pictures N−1 202 and N+1 206 and the motion vector 208 from the motion estimation unit 302 and generates the interpolated picture N 204.
The conventional extrapolation and interpolation methods and systems have several serious drawbacks. The conventional methods and systems rely upon an assumption that the pixel values do not change. However, this assumption is often invalid because the pixel values may change due to changes in lighting conditions, contrast, fading, and the like.
Indeed, no matter the accuracy of the underlying model for these conventional methods and systems, there is almost always some noise in the video signal, which means that the prediction error is usually not zero.
Further, these conventional systems and methods only have limited capability to correct and/or reduce the errors caused by the reference frame with low fidelity.
Therefore, it is desirable to provide a system and method for visual signal extrapolation and interpolation that does not have the drawbacks of the conventional motion-based extrapolation and interpolation methods.