1. Field of the Invention
The present invention relates to motion dependent video signal interpolation. More particularly, the invention relates to a method for deriving motion vectors for application in the interpolation of a video signal. The invention also relates to an interpolator apparatus for interpolating frames of an input video signal, and to a motion compensated standard video codec (encoder/decoder), such as H.26x or MPEG, for real-time applications with low-cost, high-quality frame interpolation.
2. Description of the Related Art
A video codec normally sacrifices visual quality to meet the budgetary bit constraints of very low bit rate applications (for example, video communications over Public Switched Telephone Networks (PSTN) and mobile networks) at 28.8/33.6 Kbps or lower bit rates. In practice, two rate control strategies are often jointly used to meet the low channel bandwidth requirements. The first strategy is to assign low data bits to encode each video frame. The second strategy is to reduce the video frame rate by dropping (not transmitting) part of the original video frames to maintain acceptable spatial picture quality of the coded frames. However, low bit allocation for video frames leads to noticeable spatial-domain artifacts (for example, blocking effect), and the low video frame rate can result in artifacts in the temporal domain (for example, motion jerkiness).
The motion jerkiness effect due to low temporal resolution of the coded picture can be improved with frame interpolation algorithms. For practical use of frame interpolation algorithms, the processing time and complexity are key factors to be considered.
As mentioned above, low video frame rate often causes motion jerkiness observed in the decoder. One simple and intuitive way to overcome this problem is by increasing the frame rate in the decoder to avoid jerky motion. To increase the frame rate, frame interpolation from available transmitted (or decoded) frames is required. A. M. Tekalp, xe2x80x9cDigital Video Processing,xe2x80x9d Prentice Hall, 1995 discusses three possible techniques: (1) frame repetition, (2) frame averaging and (3) motion-compensated frame interpolation (MCI).
Frame repetition simply duplicates the preceding decoded frame as the interpolated frame. Although it is the simplest method to increase the frame rate, motion jerkiness is still observed because frame repetition does not provide transitional motion between the frames. FIG. 10 shows an example of frame repetition wherein an interpolated frame (fti), which is identical to preceding frame (ft1), is placed between frame (ft1) and succeeding frame (ft2).
Frame averaging interpolates frames using the averaged pixel intensity of preceding and succeeding decoded frames using a formula such as fti=(ft1+ft2)/2, as shown in FIG. 11. Frame averaging is smoother and increases the Peak Signal-to-Noise Ratio (PSNR) due to a better to performance on the stationary portion of the frame. However, significant ghost artifacts are observed along the boundary regions of moving objects because of the luminance change. It is obvious that in the low bit rate case, the motion field provides the most useful information.
Motion-compensated interpolation (MCI), a technique of using motion information to interpolate a frame between two transmitted decoded frames, usually provides the best results. MCI was originally developed in the context of frame rate conversion, such as the conversion between different video or TV systems (such as, NTSC⇄PAL and movie⇄television). As shown in FIG. 12, MCI calculates motion vectors representing the trajectories between each pixel in a preceding frame (ft1) and a current frame (ft2) to create an interpolated frame (fti) that is between frames (ft1) and (ft2). A great deal of complexity is involved in the calculation of motion vectors for each pixel. A great amount of work has been done in the field of MCI, and the following references are hereby incorporated by reference:
[1] A. M. Tekalp, xe2x80x9cDigital Video Processing,xe2x80x9d Prentice Hall, Upper Saddle River, N.J. 1995).
[2] M. Bierling and R. Thomas, xe2x80x9cMotion Compensating Field Interpolation Using a Hierarchically Structured Displacement Estimator,xe2x80x9d Signal Processing, pages 387-403, 1986.
[3] R. Thoma and M. Bierling, xe2x80x9cMotion Compensating Interpolation Considering Covered and Uncovered Background,xe2x80x9d Signal Processing: Image Compression 1, pages 191-212, 1989.
[4] M. Bierling and R. Thoma, xe2x80x9cMotion Compensating Field Interpolation Method Using a Hierarchically Structured Displacement Estimator,xe2x80x9d U.S. Pat. No. 4,771,331, September 1988.
[5] C. Cafforio, F. Rocca, and S. Tubaro, xe2x80x9cMotion Compensated Image Interpolation,xe2x80x9d IEEE Trans. Communication, vol. 38, no. 2, pages 215-222, February 1990.
[6] S. Tubaro and F. Rocca, xe2x80x9cMotion Estimators and Their Application to Image Interpolation,xe2x80x9d Motion Analysis and Image Sequence Processing, Kluwer Academic Publishers, 1993.
[7] J. K. Su and R. M. Mersereau, xe2x80x9cMotion-Compensated Interpolation of Untransmitted Frames in Compressed Video,xe2x80x9d 30th Asilomrar Conf. on Signals, System and Computers, pages 100-104, November 1996.
[8] B. L. Hinman, xe2x80x9cMethod and Apparatus for Efficiently Communicating Image Sequence Having Improved Motion Compensation,xe2x80x9d U.S. Pat. No. 4,727,422, February 1988.
[9] A. Nagata, K. Takahashi and N. Takeguchi, xe2x80x9cMoving Image Signal Encoding Apparatus and Decoding Apparatus,xe2x80x9d U.S. Pat. No. RE35910, September 1998.
[10] E. Collet and M. Kerdranvat, xe2x80x9cMethod and Apparatus for Motion Interpolated Interpolation,xe2x80x9d U.S. Pat. No. 5,844,616, December 1998.
[11] A. N. Netravali and J. D. Robbins, xe2x80x9cVideo Signal Interpolation Using Motion Estimation,xe2x80x9d U.S. Pat. No. 4,383,272, April 1981.
[12] N. I. Saunders and S. M. Keating, xe2x80x9cMotion Compensated Video Signal Processing,xe2x80x9d U.S. Pat. No. 5,347,312, September 1994.
[13] J. W. Richards and C. H. Gillard, xe2x80x9cStandards Conversion of Digital Video Signals,xe2x80x9d U.S. Pat. No. 5,303,045, April 1994.
[14] B. G. Haskell and A. Puri, xe2x80x9cConditional Motion Compensated Interpolation of Digital Motion Video,xe2x80x9d U.S. Pat. No. 4,958,226, September 1990.
[15] G. De Haan et al., xe2x80x9cApparatus for Performing Motion-Compensated Picture Signal Interpolation,xe2x80x9d U.S. Pat. No. 5,534,946, July 1996.
[16] G. De Haan et al., xe2x80x9cMotion-Compensated Interpolation,xe2x80x9d U.S. Pat. No. 5,777,682, July 1998.
Thoma et al. (reference [3]) disclose an MCI method which considers both covered and uncovered backgrounds. They employed hierarchical displacement motion estimation to provide a better displacement field for interpolation. For the frame rate conversion problem as discussed above or in most previous MCI work (references [2-6,11,13,15]), instead of using a block-based motion field, pixel-wise motion estimation is often required to determine the dense motion field in order to provide an accurate motion trajectory for each pixel. As a consequence, the computational complexity of MCI is very high due to the complicated motion estimation process involved and thus is not practical for real-time video communication applications (e.g., videophone and videoconferencing).
In applications such as videophones and videoconferencing, frame interpolation is performed at the decoder of a block-based compression standard such as MPEG and H.26x. Therefore, the motion information is already available to the decoder. However, the motion information from standard video decoders is in the form of a block-based motion field rather than a pixel-based motion field. In order for MCI to use the output of a standard block-based video decoder, an additional motion search during interpolation would be required. The additional motion searches during interpolation would increase the complexity and costs of the system to make it impractical for many applications.
Su et al. (reference [7]) disclose a system utilizing a block-based motion field from a video decoder for frame interpolation. One of their proposed techniques does not require an additional motion search during interpolation. However, their approach is a simple MCI design that does not consider the location of the moving object. Consequently, the covered and uncovered backgrounds cannot be predicted correctly. This approach does not provide satisfactory video quality for many applications.
A similar method to that of Su et al. is found in U.S Pat. No. 4,727,422 (reference [8]). Both the Su et al. method and the method of U.S. Pat. No. 4,727,422 cause errors in the interpolated frames because there are no correct motion vectors in cases where: (1) there are objects which move in different directions from each other in a block; (2) the background appears from the shade of a moving object (uncovered background), or the background is hidden by a moving object (covered background); (3) the moving object changes in shape; and (4) there is a movement accompanied by rotation.
In U.S. Pat. No. RE35910 (reference [9]), a specially designed error evaluator and coder associated with a frame interpolator is included in the block-based motion-compensated video encoder to evaluate and encode the interpolation error so that an error corrector at the video decoder can use the error information to compensate the interpolation errors mentioned above. This proposed method is not compatible with the existing H.26x/MPEG video coding standards, since there are no such mechanisms to provide interpolation error information in the standard H.26x/MPEG bistreams. In this proposed method, the complexity of the video encoder will increase while the coding efficiency will decrease since the method requires additional extra circuitry and bits to evaluate and transmit the interpolation errors. A similar idea is found in U.S. Pat. No. 4,958,226 (reference [14]).
A method disclosed in U.S. Pat. No. 5,844,616 (reference [10]) reduces the hardware complexity of the MCI with sub-pixel accuracy for HDMAC/HDTV systems. The method focuses on the interpolation of pixels on the half-pixel grid using the available full-pixel samples, while the potential problems associated with the attempted use of a block-based MCI format are not addressed.
In U.S. Pat. No. 5,534,946 (reference [15]), an ordered statistical filtering method using more than one motion vector is disclosed in order to refine the motion field so as to eliminate the artifacts arising due to the discontinuities of the motion vector field in MCI. In U.S. Pat. No. 5,777,683 (reference [15]), a similar approach is proposed for block-based MCI.
A brief overview of MCI will be provided. Symbols and terminology used throughout this application are defined:
p-denotes the 2-D pixel Cartesian coordinate in a frame;
ft refers to the frame at temporal reference t;
ft(p) represents the intensity at pixel p of frame ft;
B(p) designates the macroblock (typically consisting of 16xc3x9716 pixels) to which pixel p belongs;
N(p) represents the eight nearest neighbor macroblocks around B(p);
NB(p) is equal to B(p)∪N(p);
Vm,n(B(p)) is a block-based motion vector of block B(p) from ftm to ftn;
vm,n(p) denotes the displacement motion vector for pixel p from ftm to ftn, where the displacement motion vector is defined as the inter-image motion of the contects of a respective macroblock.
A macroblock consists of 16xc3x9716 pixels.
INTRA coding refers to encoding a picture (a field or a frame) or a macroblock without reference to any other picture or macroblock, but the INTRA-coded picture or macroblock of picture pixels can be used as a reference for other pictures and macroblocks.
INTER coding refers to encoding a picture (a field or a frame) or a macroblock with reference to another picture or macroblock of picture pixels. Compared to the INTRA-coded picture or macroblock, the INTER-coded picture or macroblock may be coded with greater efficiency.
Given two continuous decoded frames, i.e. the preceding frame ft1 and the current frame ft2, where t1 less than t2, the goal of frame interpolation is to insert an interpolated frame fti at time ti, wherein: t1 less than ti less than t2. The concept of MCI is to interpolate the frame fti based on the knowledge of the location of moving objects and the corresponding motion trajectories among ft1, ft2 and fti.
Standard MCI classifies each pixel inside a frame image into one of four classes: Moving Object (MO), Stationary Background (SB), Covered Background (CB) and Uncovered Background (UB) so that ft=MOt, SBt, CBt, or UBt (as shown in FIG. 13). These four classes are mutually exclusive. The interpolation ratios are as follows. Rf=(tixe2x88x92t1)/(t2xe2x88x92t1) and Rb=(t2xe2x88x92ti)/(t2xe2x88x92t1) stand for the forward (from the preceding decoded frame to the interpolated frame) and the backward (from the current decoded frame to the interpolated frame) interpolation ratios, respectively.
Interpolation for various classes of objects is performed as follows.
It is assumed that MOt1, MOt2 and the motion vector field {v1,2(p)|p∈MOt2} are all known. Then, the motion trajectory Rbxc2x7v1,2(p) can be used with MOt2 to predict the object location. MOti on the interpolated frame fti. Once the moving object class MOti is determined, the Uncovered Background UBti, and the Covered Background CBti in the interpolated frame can be identified also. From FIG. 13, it is apparent that UBti is determined by finding out the Uncovered Background considering the object moving from MOt1 to MOt2. Since the corresponding position of UBti(p) is occupied by MOt1(p) at time t1, UBti(p) can be predicted only from the corresponding pixel background in ft2(p). CBti can be determined in a similar way. Finally, all of the remaining pixels can be classified as SBti. After the class for each pixel p is determined, the pixels of the interpolated frame can be predicted by using the following six equations:
MOti: fti(p)=Rbxc2x7ft1(pxe2x88x92Rfxc2x7v1,2(p))+Rfxc2x7ft2(p+Rb v1,2(p)), if bi-directionally predictedxe2x80x83xe2x80x83(1)
fti(p)=ft1(pxe2x88x92Rfxc2x7v1,2(p)), if forwardly predictedxe2x80x83xe2x80x83(2)
fti(p)=ft2(p+Rbxc2x7v1,2(p)), if backwardly predictedxe2x80x83xe2x80x83(3)
UBti: fti(p)=ft2(p)xe2x80x83xe2x80x83(4)
CBti: fti(p)=ft1(p)xe2x80x83xe2x80x83(5)
SBti: fti(p)=Rbxc2x7ft1(p)+Rfxc2x7ft2(p)xe2x80x83xe2x80x83(6)
As shown above, there are three possible methods to interpolate the MOti class.
However, there are problems associated with implementing MCI methods of the prior art. One problem associated with prior art MCI methods is the requirement of both a good segmentation design and true motion field information at the video decoder to obtain high-quality interpolation frames.
Another problem with MCI is the occurrence of overlapped pixels and holes in the interpolated object Moti. This problem is caused both by occlusion and resolution.
Occlusion may occur because of two circumstances. First, even if the true motion trajectory were available for each pixel in MOt2, the object is usually not under rigid motion. In other words, MOt1 and MOt2 are not of the same shape. Second, even when the object is under rigid motion, the estimated motion field may not be in parallel within the same object due to poor motion estimation. In either case, the motion trajectory is not a one-to-one mapping from MOt1 to MOt2. Therefore, the interpolated object MOti tends to contain some overlapped pixels and holes because of occlusion.
Resolution also causes overlapped pixels and holes. Considering that frames and motion fields are in an integer or half-pixel resolution, when the motion trajectory is traversed from integer pixel location p of MOt2, the mapping of that pixel in MOti is p+Rb(v1,2(p)), which may no longer match the image grid. Although rounding off to the desired resolution is commonly used, it leads to overlapping pixels and holes.
One solution to correct overlapped pixels involves averaging the intensities of the overlapped its pixels. However, there is still the problem of correcting the holes that is not solved by averaging intensities. Even though a spatial interpolation might be adopted as a way to correct the holes, this would become very complex because the spatial neighborhood of a hole may still contain other holes. Another way of correcting holes is by estimating the displacement motion of a particular hole by defining the neighboring displacement motion field and then traversing the motion trajectory from an integer pixel location in MOti to the possible fractional pixel location in MOt1 or MOt2. In this particular case, the resolution problem which occurs in the decoded frame can be easily handled by using spatial interpolation (since no hole is contained in the decoded frame), but a more complex system is required.
Accordingly, due to the above limitations, the prior art has been unable to provide a practical block-based MCI system, which does not require additional motion searches during interpolation, and yet provides acceptable video quality.
An object of the present invention is to provide a fast motion-compensated interpolation (FMCI) method for the decoder of a block-based video codec operating in low bit rates, or for use in frame rate up-conversion or for use in systems where the CPU computing power cannot meet the real-time requirement.
Another object of the present invention is to provide an apparatus for a motion compensated video frame interpolator which utilizes MCI without involving an additional motion search during interpolation, and which utilizes block-level motion vectors provided from a standard video decoder (such as H.26x/MPEG) rather than pixel-level motion vectors.
Still another object of the invention is to provide frame interpolation by a maximal exploitation of the available block motion information without employing an additional motion search in the decoder. Therefore, the complexity of the system can be reduced in comparison with standard MCI used in frame rate conversion. In an embodiment, the MCI prediction unit performs motion vector mapping to calculate UBti, CBti, and SBti (using equations (4) to (6)) and then performs standard MCI prediction.
According to a particular aspect.of the invention, there is provided a method of block-based motion-compensated interpolation of a video signal based on blockwise motion vectors and frame information of a plurality of frames being provided by a block-based video decoder. This method comprises (a) performing a segmentation operation on the plurality of frames of the video signal to identify an initial moving object block and background information blocks, wherein the background information blocks are identified as a stationary block (SB), an uncovered block (UB), and a covered block (CB); (b) mapping a motion vector of one of the blockwise motion vectors to provide an output of a mapped moving object block (MO) whose pixels each have the motion vector mapped thereto; (c) classifying the mapped moving object block (MO) obtained in step (b) and the background information blocks obtained in step (a) to identify an interpolated mapped moving object block (MOti) and interpolated background information blocks including an interpolated stationary block (SBti), an interpolated uncovered block (UBti), and an interpolated covered block (CBti); and (d) processing MOti, SBti, UBti, CBti and the frame information from the plurality of frames to generate an interpolated frame relative to the one of said frames. Step (d) may further include performing gap closure of the interpolated moving object block (MOti) to obtain increased solid areas to improve a quality of the interpolated frame. The segmentation operation may further comprise (i) performing a morphological closure operation by removing holes in the initial moving object block to obtain a morphologically closed segmented moving object block; and (ii) performing pattern block refinement by comparing the morphologically closed segmented moving object block obtained in step (i) with a plurality of pattern blocks, obtaining a pattern block having a closest matching pattern to the morphologically closed segmented moving object block, and replacing the morphologically closed segmented moving object with said pattern block selected in step (ii). The blockwise motion vectors and the frame information received in step (a) may be provided by one of an MPEG and a H.26x video decoder. Each pattern block of the plurality of pattern blocks may be a macroblock comprising 16xc3x9716 pixels provided in 16 sub-blocks arranged in a 4xc3x974 matrix, and each one of the sub-blocks may comprise 16 pixels arranged in a 4xc3x974 matrix. The plurality of pattern blocks may comprise 34 patterns. The method may further include providing the mapped moving object block (MO) produced by step (b) with a shape corresponding to a shape of the closest matching pattern block.
According to another particular aspect of the invention, there is provided an apparatus for performing block-based motion-compensated frame interpolation of a video signal based on blockwise motion vectors and frame information of a plurality of frames of the video signal. Such apparatus comprises (a) segmentation means for performing a segmentation operation on the plurality of frames of the video signal to identify an initial moving object block and background information blocks for one of the frames, the background information blocks comprising a stationary block (SB), an uncovered block (UB) and a covered block (CB); (b) mapping means for mapping a motion vector of one of the blockwise motion vectors to each pixel of the initial moving object block to provide a mapped moving object block whose pixels each have the motion vector mapped thereto; (c) classification means for processing the mapped moving object block (MO) output from the mapping means and the background information blocks obtained from the segmentation means to identify an interpolated mapped moving object block (MOti) and interpolated background information blocks including an interpolated stationary block (SBti), an interpolated uncovered block (UBti), and an interpolated covered block (CBti); and (d) motion compensated interpolation means for processing MOti, SBti, UBti, CBti and the frame information relating to the plurality of frames to generate an interpolated frame relative to the one of said frames.
The apparatus may further comprise a motion vector replacement unit for comparing the one of the blockwise motion vectors with a set of predetermined criteria to determine whether a value of the one of said blockwise motion vectors requires replacement with a corrected value; a residue map which maps prediction errors obtained from a block-based video decoder and outputs the mapped prediction errors to the motion vector replacement unit; a morphological closure unit for processing the initial motion moving object block output by said segmentation means to obtain a morphologically closed segmented moving object block; and a template matching unit for processing the morphologically closed segmented moving object block output by the morphological closure unit, wherein the template matching unit compares the morphologically closed segmented moving object block with a plurality of pattern blocks to obtain a most similar pattern block of said plurality of pattern blocks and outputs the most similar pattern block of the plurality of pattern blocks to the motion vector mapping unit in place of the morphologically closed segmented moving object block.
The motion compensated interpolation means may include a gap closure unit for processing gaps in the interpolated moving object block (MOti) to obtain increased solid areas in the interpolated moving object block (MOti) to improve a quality of the interpolated frame. The template matching unit may comprise 34 pattern blocks in storage. Each pattern block of the plurality of pattern blocks may comprise a macroblock of 16xc3x9716 pixels provided in 16 sub-blocks arranged in a 4xc3x974 matrix; and each sub-block may comprise 16 pixels arranged in a 4xc3x974 matrix.
The mapped moving object block (MO) output by the mapping means may have a shape corresponding to a shape of the similar pattern block output by the segmentation means.
The blockwise motion vectors and the frame information may be provided by a block-based video decoder comprising one of an MPEG video decoder and a H.26x video decoder.