For an overall understanding of general techniques involved in encoding and decoding video image information, reference should be made to the "MPEG-4 Video Verification Model Version 5.0", prepared for the International Organization for Standardization by the Ad hoc group on MPEG-4 video VM editing, paper Number MPEG 96/N1469, November 1996, the contents of which are herein incorporated by reference.
This invention relates to encoding and decoding of complex video image information including motion components which may be encountered, for example, in multimedia applications, such as video-conferencing, video-phone, and video games. In order to be able to transfer complex video information from one machine to another, it is often desirable or even necessary to employ video compression techniques. One significant approach to achieving a high compression ratio is to remove the temporal and spatial redundancy which is present in a video sequence. To remove spatial redundancy, an image can be divided into disjoint blocks of equal size. These blocks are then subjected to a transformation (e.g., Discrete Cosine Transformation or DCT), which decorrelates the data so that it is represented as discrete frequency components. With this representation, the block energy is more compact, hence the coding of each block can be more efficient. Furthermore, to achieve the actual compression, two-dimensional block elements are quantized. At this point, known run-length and Huffman coding schemes can be applied to convert the quantized data into a bit-stream. If the above process is applied to one block independent of any other block, the block is said to be intra-coded. On the other hand, if the block uses information from another block at a different time, then the block is said to be inter-coded. Inter-coding techniques are used to remove temporal redundancy. The basic approach is that a residual block (or error block) is determined based on the difference between the current block and a block in a reference picture. A vector between these two blocks is then determined and is designated as a motion vector. To keep the energy in the residual block as small as possible, block-matching algorithms (BMAs) are used to determine the block in the reference picture with the greatest correlation to the current block. With the reference block locally available, the current block is reconstructed using the motion vector and the residual block.
For the most part, video coding schemes encode each motion vector differentially with respect to its neighbors. The present inventors have observed that a piecewise continuous motion field can reduce the bit rate in this case. Hence, a rate-optimized motion estimation algorithm has been developed. The unique features of this proposal come from two elements: (1) the number of bits used for encoding motion vectors is incorporated into the minimization criterion, and (2) rather than counting the actual number of bits for motion vectors, the number of motion vector bits is estimated using the residues of the neighboring blocks. With these techniques, the bit-rate is lower than in prior encoders using full-search motion-estimation algorithms. In addition, the computational complexity is much lower than in a method in which rate-distortion is optimized. The resulting motion field is a true motion field, hence the subjective image quality is improved as well.
If we disregard for a moment the advantages that are achieved in terms of coding quality and bit rate savings, and only concentrate on the improvements in subjective image quality, it can be demonstrated that the resulting true motion field can be used at the decoder, as well, in a variety of other ways. More specifically, it has been found that the true motion field can be used to reconstruct missing data, where the data may be a missing frame and/or a missing field. In terms of applications, this translates into frame-rate up-conversion, error concealment and interlaced-to-progressive scan rate conversion capabilities, making use of the true motion information at the decoder end of the system.
Frame-Rate Up-Conversion. The use of frame-rate up-conversion has drawn considerable attention in recent years. To accomplish acceptable coding results at very low bit-rates, most encoders reduce the temporal resolution, i.e., instead of targeting the full frame rate of 30 frames/sec (fps), the frame rate may be reduced to 10 fps, which would mean that 2 out of every 3 frames are never even considered by the encoder. However, to display the full frame rate at the decoder, a recovery mechanism is needed. The simplest mechanism is to repeat each frame until a new one is received. The problem with that interpolation scheme is that the image sequence will appear very discontinuous or jerky, especially in areas where large or complex motion occurs. Another simple mechanism is linear-interpolation between coded frames. The problem with this mechanism is that the image sequence will appear blurry in areas of motion, resulting in what is referred to as ghost artifacts.
From the above, it appears that motion is the major obstacle to image recovery in this manner. This fact has been observed by a number of prior researchers and it has been shown that motion-compensated interpolation can provide better results. In one approach, up-sampling results are presented using decoded frames at low bit-rates. However, the receiver must perform a separate motion estimation just for the interpolation. In a second approach, an algorithm that considers multiple motion is proposed. However, this method assumes that a uniform translational motion exists between two successive frames. In still a third approach, a motion-compensated interpolation scheme is performed, based on an object-based interpretation of the video. The main advantage of the latter scheme is that the decoded motion and segmentation information is used without refinement. This may be attributed to the fact that the object-based representation is true in the "real" world. However, a proprietary codec used in that approach is not readily available to all users.
The method proposed in the present case is applicable to most video coding standards in that it does not require any proprietary information to be transmitted and it does not require an extra motion estimation computation. The present motion-compensated interpolation scheme utilizes the decoded motion information which is used for inter-coding. Since the current true motion estimation process provides a more accurate representation of the motion within a scene, it becomes possible to more readily reconstruct information at the decoder which needs to be recovered before display. Besides quality, the major advantage of this method over other motion compensated interpolation methods is that significantly less computation is required on the decoder side.
Error Concealment. True motion vector information can also be employed to provide improved error concealment. In particular, post-processing operations at the decoder can be employed to recover damaged or lost video areas based on characteristics of images and video signals.
Interlaced-to-Progressive Scan Conversion. In addition to the above motion-compensated interpolation method, a related method for performing interlaced-to-progressive scan conversion also becomes available. In this scenario, rather than recovering an entire frame, an entire field is recovered. This type of conversion is necessary for cases in which a progressive display is intended to display compressed inter-frame, and motion-compensated inter-frame.
Intraframe methods. Simple intraframe techniques interpolate a missing line on the basis of two scanned lines which occur immediately before and after the missing line. One simple example is the "line averaging" which replaces a missing line by averaging the two lines adjacent to it. Some other improved intraframe methods which use more complicated filters or edge information have been proposed by M. H. Lee et al. (See "A New Algorithm for Interlaced to Progressive Scan Conversion Based on Directional Correlations and its IC Design," IEEE Transactions on Consumer Electronics, Vol. 40, No. 2, pp. 119-129, May 1994). However, such intraframe techniques cannot predict information which is lost from the current field, but which appears in neighboring fields.
Interframe techniques take into account the pixels in the previous frame in the interpolation procedure. One simple and widely-adopted method is the field staying scheme which lets I(m,2n+((t-1) mod 2),t)=I(m,2n+((t-1) mod2), t-1). Non-motion-compensated approaches, which apply linear or nonlinear filters, are fine for stationary objects, but they result in severe artifacts for moving objects.
For moving objects, it has been found that motion compensation should be used in order to achieve higher quality. Some motion compensated de-interlacing techniques have been proposed. For example, it has been shown that motion compensated de-interlacing methods are better than the intraframe methods and non-motion-compensated interframe methods (see Lee et al., "Video Format Conversions between HDTV Systems," IEEE Transactions on Consumer Electronics, Vol. 39, No. 3, pp. 219-224, August. 1993).
The system disclosed by the present inventors utilizes an accurate motion estimation/compensation algorithm so it is classified as a motion-compensated interframe method. The interlaced-to-progressive scan conversion procedure contains two parts: (1) a motion-based compensation, and (2) a generalized sampling theorem. The motion-based compensation essentially determines a set of samples at a time 2t, given samples at 2t-1 and 2t+1. In general, these determined sets of samples will not lie on the image grid at time 2t, since the motion vector between 2t-1 and 2t+1 is arbitrary. Therefore, a generalized sampling theorem is used to compute the missing samples at the grid points given the motion compensated samples and the samples which already exist. Formally, this can be expressed as: first, find {I(m+.DELTA..sub.x,2n+.DELTA..sub.y,2t)} given {I(m,2n-1,2t-1)} and {I(m,2n+1,2t+1)}, then find {I(m,2n+1,2t)} given {I(m,2n,2t)} and {I(m+.DELTA..sub.x,2n+.DELTA..sub.y,2t)}.
While the invention will be described hereinafter in terms of a preferred embodiment and one or more preferred applications, it will be understood by persons skilled in this art that various modifications may be made without departing from the actual scope of this invention, which is described hereinafter with reference to the drawing.
In accordance with a further aspect of the present invention, a method of image data interpolation comprises decoding true motion vector data associated with blocks of digitally encoded image information, with the true motion vector data being dependent in part on neighboring image block proximity weighting factors. The method further comprises interpolating from the supplied picture information, image sequence signal data corresponding to intermediate image time intervals absent from the supplied picture information, the absent image time intervals corresponding to intermediate image information occurring during the intermediate image time intervals in sequence between supplied image time intervals associated with the supplied picture information. The interpolation comprises the steps of constructing image pixels for image blocks in each intermediate image time interval, based upon corresponding pixels in corresponding blocks in the supplied picture information which occur immediately before and after the intermediate image time interval, by distributing to constructed image pixels a fractional portion of image intensity difference information between the corresponding pixels occurring before and after the intermediate time to produce averaged intensity pixels for the intermediate image time interval. Thereafter, each constructed image pixel in each intermediate time interval is associated with a corresponding true motion vector equal in magnitude to a fractional part of the true motion vector information associated with the block in which the corresponding pixel is located in a reference supplied time interval, the fractional part being determined according to the number of intermediate image time intervals inserted between supplied time intervals. Each constructed, averaged intensity pixel is then associated with a spatial location in the intermediate image time interval according to the fractional part of the corresponding decoded true motion vector.