Extrapolation and interpolation of a visual signal, such as image, video, and graphics, have been widely used in various contexts, including, but not limited to: video-coding, transcoding, error concealment, pre-processing, and interactive rendering.
For instance, techniques for extrapolating and interpolating in video-coding applications have been described by Aaron et al., Toward Practical Wyner-Ziv Coding of Video, PROC. IEEE INT. CONF ON IMAGE PROCESSING, pp. 869-872, Barcelona, Spain, Spet. (2003), Puri et al., PRISM: A NewRobust Video Coding Architecture based on Distributed Compression Principles, ALLERTON CONFERENCE ON COMMUNICATION, CONTROL AND COMPUTING, (2002), and Yaman et al., A Low-Complexity Video Encoder with Decoder Motion Estimation, Proc. ICASSP, Montreal, Canada, (2004). Techniques for extrapolating and interpolating in transcoding applications have been described by U.S. Pat. No. 6,058,143 issued on May 2, 2000 to Golin for “Motion Vector Extrapolation for Transcoding Video Sequences.” Further, techniques for extrapolating and interpolating in error concealment for video decoding or post-processing applications have been described by Peng et al., Block-Based Temporal Error Concealment for Video Packet Using Motion Vector Extrapolation, International Conf on Communications, Circuits, Systems and West Sino Expo, pp. 10-14, Jun. 29-Jul. 1, 2002 and by U.S. Pat. No. 6,285,715 issued on Sep. 4, 2001 to Ozcelik for “Methods and Apparatus for Error Concealment While Decoding a Coded Video Bit Stream.” The visual signal extrapolation and interpolation methods used in video coding, transcoding, error concealment, video decoding, and post-processing applications are typically based on motion information and are therefore referred to as motion-based extrapolation and interpolation methods, respectively.
Non-motion-based extrapolation/interpolation methods, which are typically used in other applications, include the model-based view extrapolation method used for virtual reality rendering, the feature extrapolation method used for pre-compression, and the video fading scene prediction method. For example, the model-based view extrapolation method is described by U.S. Pat. No. 6,375,567 issued on Apr. 23, 2002 to Acres for “Model-Based View Extrapolation for Interactive Virtual Reality Systems.” The feature extrapolation method is described by U.S. Pat. No. 5,949,919 issued on Sep. 7, 1999 to Chen for “Precompression Extrapolation Method.” The video fading scene prediction is described by Koto et al., Adaptive Bi-Predictive Video Coding Temporal Extrapolation, ICIP (2003).
One example of the motion-based extrapolation/interpolation methods is the Wyner-Ziv video coding technique. A typical Wyner-Ziv video coding system includes a video encoder and a video decoder. The video encoder is a low complexity and low power encoder, so the computation-heavy signal processing tasks, such as the motion estimation, are carried by the decoder instead. To achieve high efficiency, the Wyner-Ziv decoder needs to exploit the correlation between the source and side information, which is only known to the decoder, in order to decode the received video signals and reconstruct the video. The source information is the video signal (e.g., a picture) to be encoded at the encoder and transmitted to the decoder for decoding, and the side information is essentially an estimate of the picture to be decoded. Since the performance of the Wyner-Ziv system depends heavily on the reliability of the side information, the mechanism used by the decoder for generating the side information plays a very crucial role in the Wyner-Ziv video coding system. Typically, the decoder first performs motion estimation on previously reconstructed pictures to generate a set of motion vectors and then uses such motion vectors to generate an estimate of the picture currently being decoded by extrapolation or interpolation. This estimate is used as the side information by the decoder for decoding and reconstructing the current picture.
FIG. 1 is a diagram illustrating a motion-based temporal extrapolation process well known in the art. Specifically, in order to extrapolate a current Picture N, motion estimation is first performed on at least two previously reconstructed pictures, namely, Pictures N−2 and N−1, to generate a set of motion vectors for each pixel or a block of pixels in Picture N−1, which are indicative of the motion of the pixel or the block of pixels between Picture N−1 and Picture N−2 (i.e., a “reverse” motion). Then, the motion vectors are manipulated according to a predetermined function that is established upon an underlying motion model or assumption. For example, if a constant linear displacement motion model is assumed, the motion vectors are reversed, and the pixel or the block of pixels associated with the motion vectors is extrapolated (i.e., mapped) from its location in Picture N−1 to a location defined by the reversed motion vectors in an estimate of the current Picture N, as shown in FIG. 1. Note that the motion vectors can also be constructed for each pixel or a block of pixels in Picture N−2 to indicate the motion between Picture N−2 and Picture N−1. In such an incident, the motion vectors should then be shifted, and the pixel or the block of pixels associated with the motion vectors should be extrapolated or mapped from its location in Picture N−1 to a location defined by the scaled motion vectors in an estimate of the current Picture N. The motion-based temporal extrapolation process as described hereinabove therefore creates an estimate of the current Picture N, after all the pixels or the blocks of pixels in Picture N−1 (or Picture N−2) are mapped.
FIG. 2 further illustrates a well-known motion-based temporal interpolation process. Motion estimation is first performed on at least two previously reconstructed pictures, namely, Pictures N−1 and N+1, to obtain a set of motion vectors for each pixel or a block of pixels in Picture N−1, which are indicative of the motion of the pixel or the block of pixels from Picture N−1 to Picture N+1. Then, the motion vectors are scaled down (e.g., by a factor of 2) based on an underlying assumption for a constant linear displacement motion model, and the pixels or the blocks of pixels associated with the motion vectors are interpolated from their locations in Picture N−1 and/or N+1 to a location defined by the scaled motion vectors in an estimate of the current Picture N, as shown in FIG. 2. Note that the motion vectors can also be constructed for each pixel or a block of pixels in Picture N+1 to indicate the motion between Picture N+1 and Picture N−1. In such an incident, the motion vectors should also be scaled down (e.g., by a factor of 2), and the pixels or the blocks of pixels associated with the motion vectors should be interpolated from their locations in Picture N−1 and/or Picture N+1 to a location defined by the scaled motion vectors in an estimate of the current Picture N. The motion-based temporal interpolation process as described hereinabove also creates an estimate of the current Picture N, after all the pixels or the blocks of pixels in Picture N+1 (or Picture N−1) are mapped.
FIG. 3 is a flowchart that describes the processing steps used for achieving the well known motion-based extrapolation and interpolation. Specifically, motion estimation is first performed on picture signals obtained from previously reconstructed pictures to generate a set of motion vectors. The motion vectors are then manipulated, according to an underlying motion model or assumption, to generate an estimate of the picture to be decoded by either extrapolation or interpolation, depending on the temporal relationship between the picture to be decoded and the previously reconstructed pictures.
The above-described conventional motion-based extrapolation and interpolation methods have several serious drawbacks, including:                1. The underlying assumption that the objects follow a constant motion model (usually a constant linear displacement model) from picture to picture often does not hold true for real visual signals; and        2. The extrapolation or interpolation may not result in a one-to-one mapping between the previously reconstructed picture(s) and the estimate picture. Some pixel positions in the extrapolated or interpolated picture (i.e., the estimate) may not get any mapping from the previously reconstructed picture(s), i.e., leaving empty holes, while other pixel positions in the extrapolated or interpolated picture may have multiple mappings from the previously reconstructed picture(s), i.e., leaving superimposed spots.        
It is therefore desirable to provide an improved system and method for visual signal extrapolation and interpolation, without the drawbacks of the conventional motion-based extrapolation and interpolation methods.