Motion-compensated frame interpolation (MCFI) that uses the received motion vectors (MVs) has recently been studied to improve temporal resolution by doubling the frame rate at the decoder. MCFI is particularly useful for video applications that have low bandwidth requirements and need to reduce frame rate to improve spatial quality. In MCFI, the skipped frame is often interpolated based on the received motion vector field (MVF) between two consecutive reconstructed frames, denoted by ƒt−1 and ƒt+1 respectively. Based on the assumption that objects move along the motion trajectory, the skipped frame ƒt can be interpolated bi-directionally using the following equation:
                                          f            t                    ⁡                      (                          x              ,              y                        )                          =                                            w              f                        ·                                          f                                  t                  -                  1                                            ⁡                              (                                  x                  +                                                            1                      2                                        ⁢                                                                  v                        x                                            ·                      y                                                        +                                                            1                      2                                        ⁢                                          v                      y                                                                      )                                              +                                    w              b                        ·                                                            f                                      t                    +                    1                                                  ⁡                                  (                                      x                    -                                                                  1                        2                                            ⁢                                                                        v                          x                                                ·                        y                                                              -                                                                  1                        2                                            ⁢                                              v                        y                                                                              )                                            .                                                          (        1        )            where v=(vx, vy) is the received MVF in the bit stream for reconstructing the frame ƒt+1, and wf and wb are the weights for the forward and backward predictions, respectively, which are often set to 0.5. This frame interpolation method is also called the direct MCFI as it assumes that the received MVs can represent true motion and can directly be used. However, MCFI that directly uses the received MVs often suffers from annoying artifacts such as blockiness and ghost effects.
In general, it is difficult and costly in terms of coding efficiency for an encoder to capture all the true motions in a video frame using block-based motion estimation. It is also not realistic to assume that all encoders are made aware of the fact that skipped frames will be interpolated at the decoder. Even though MVs can be re-estimated at the decoder by considering spatial and temporal correlations, the true motion can easily be distorted due to coding artifacts such as blockiness and blurriness. Those MV processing methods that remove outliers using vector median filters or refine MVs using smaller block sizes can only perform well when the video has smooth and regular motions. That is, they are based on the assumption that the MVF should be smooth. However, this is usually not true as a video frame may contain complex motions, especially on the motion boundaries, where the true motion field is not smooth at all. As a result, many irregular motions may appear in the received MVF and dominate the vector median filtering process to take those irregular MVs as the true motion. In addition, since many of the methods only operate on a smaller block size, they often fail to consider the edge continuity and the structure of the objects. When several macro blocks (MBs) in the same neighborhood have irregular MVs due to multiple objects moving in different directions, the structure of the objects usually cannot be maintained. MBs that are intra-coded also make frame interpolation difficult as their MVs are not available. Some methods use object based motion estimation and/or interpolation at the decoder to maintain the object structure and minimize the interpolation error. However, high computational complexity prevent such methods from being used in resource limited devices such as mobile phones. Therefore, frame interpolation still remains a very challenging problem as the artifacts due to the use of improper MVs can be very noticeable, unless an extremely complex method is used.