When a motion picture needs to be transmitted or displayed at a frame rate different from an original frame rate of the motion picture, or when a perceived depth of a stereoscopic image pair needs to be adjusted, a need arises to “interpolate” between two images. The term “image interpolation” refers to obtaining an image displaying objects, the shape, size, brightness, color, and/or position of which is intermediate between those of corresponding objects of the two known images.
Temporal image interpolation is used in frame rate up-conversion to increase the number of displayed images per second in a video or film sequence. It creates a more smooth motion in the video or converts between different video and film frame rates that are used around the world. Frame rate up-conversion is also used in a low bit rate video transmission. In this case, the source video is temporally down-sampled to a low frame rate and transmitted at a low bit rate. At the receiver, the original frame rate is recovered using frame rate up-conversion.
View interpolation based on so called disparity compensation reconstructs intermediate views or images between the left-eye and right-eye views of a stereoscopic still image pair or video. It is used to create multi-view video, adjust perceptional depth, and guide graphic and text editing and integration in 3D video.
Motion-compensated or disparity-compensated interpolation creates new images by image interpolation along so called motion trajectories or disparity trajectories, respectively. The motion or disparity trajectories correspond to movement or displacement of individual objects or individual blocks of pixels in going from one image to another. The quality of thus created images depends on an accuracy of estimation of the motion or disparity trajectories, as well as on performance of image processing methods used to smooth or even out the transitions between objects or blocks of pixels that have been displaced along the motion or disparity trajectories. An array of vectors corresponding to motion or disparity trajectories of individual blocks of pixels in going from one image to another is commonly called a “motion field” or a “disparity map”.
Referring to FIG. 1, a prior-art method of motion-compensated video frame interpolation between a first video frame 110 and a second video frame 130 is illustrated. The method includes steps of dividing the first video frame 110 into rectangular blocks 112A, B, C . . . ; obtaining the motion field by estimating motion vectors 114A, B, C . . . for each block 112A, B, C . . . in going from the first video frame 110 to the second video frame 130; and creating an interpolated video frame 120 along the estimated motion vectors 114A, B, C . . . . A special algorithm estimates the motion vectors 114A, B, C . . . for each of the blocks 112A, B, C . . . by searching the second video frame 130 for blocks 132A, B, C . . . resembling as closely as possible the blocks 112A, B, C . . . . Once the blocks 132A, B, C . . . are found, the corresponding motion vectors 114A, B, C . . . can be determined. In FIG. 1, only the motion vector 114C is shown entirely, for simplicity. After the motion vectors 114A, B, C . . . for each of the blocks 112A, B, C . . . of the first video frame 110 are estimated, the interpolated video frame 120 can be calculated by shifting the blocks 112A, B, C . . . by a fraction of the corresponding motion vectors 114A, B, C . . . , so as to obtain shifted blocks 122A, B, C . . . . The magnitude of the fraction depends on the time separation (for frame rate up-conversion), or the spatial separation (for view interpolation), between the first video frame 110 and the interpolated video frame 120. By way of example, if a video frame rate needs to be doubled, then the interpolated video frame 120 is interpolated “in the middle”, that is, the blocks 122A, B, C . . . are shifted by one half of a shift represented by the motion vectors 114A, B, C . . . .
A motion field formed by the motion vectors 114A, B, C . . . is called unidirectional because all motion vectors 114A, B, C . . . pass through the interpolated video frame 120 in one direction. The motion vectors 114A, B, C . . . are associated with the blocks 112A, B, C . . . that are contiguous in the first video frame 110. After motion compensation, the blocks 122A, B, C . . . may not be contiguous in the interpolated video frame 120, which causes two problems.
The first problem is related to so called holes and overlaps. For example, an area 125 of the interpolated video frame 120 is an area of overlap of blocks 122C and 122B. In other words, pixels in the overlap area 125 have multiple motion vectors passing through them. Pixels in a hole area 127 have no motion vectors passing through them. The hole area 127 and the overlap area 125 can be created due to motion estimation errors, or they can be created due to so called occlusions. The term “occlusion” refers to the appearance of new objects and disappearance of existing objects when comparing the first video frame 110 and the second video frame 130. By way of example, referring to FIGS. 2A and 2B, occlusions are illustrated by means of frontal and side views, respectively, of a head 200 of a fictitious cartoon character wearing goggles 202. When the head 200 is turned sideways as shown in FIG. 2B, a profile of a nose 204 appears, while a frontal view of the nose disappears as shown at 208. At the same time, one of the eyes also disappears, as shown at 206. The areas 204, 206, and 208 are termed occluded areas. The motion vectors, not shown, in the occluded areas 204, 206, and 208, could not be properly estimated, because to estimate a motion vector, a block similar to a block of the first video frame 110 has to be found in the second video frame 130. The occluded areas 204, 206, and 208 are dissimilar in the first and the second video frames 110 and 130; therefore, the motion vectors for these areas cannot be properly estimated.
The second problem is the appearance of so called blocking artifacts. Referring back to FIG. 1, a line 111 present in the first video frame 110 is broken at locations 121 in the intermediate video frame 120 as a result of the block-wise interpolation described above. Broken lines, and the blocking visual defects in general, are easily noticeable by a human vision system and therefore are highly detrimental.
A number of algorithms have been proposed to handle holes, overlaps, and blocking artifacts. In one approach, image and motion field segmentation is used. For example, Huang et al. in an article entitled “Motion-compensated interpolation for scan rate up-conversion,” Optical Engineering, vol. 35, No. 1, pp. 166-176, January 1996, incorporated herein by reference, disclose an approach based on image and motion field segmentation. In another approach, a depth order of objects is determined and used to handle overlaps. For example, Wang et al. in U.S. Pat. No. 6,625,333, incorporated herein by reference, disclose using the depth order to handle overlaps.
Benois-Pineau et al. in an article entitled “A new method for region-based depth ordering in a video sequence: application to frame interpolation,” Journal of Visual Communication and Image Representation, vol. 13, pp. 363-385, 2002, which is incorporated herein by reference, disclose a median filter to fill hole areas such as the hole area 127 in FIG. 1. Bertalmio et al. in an article “Image inpainting,” Computer Graphics (SIGGRAPG 2000), July 2000, pp. 417-424, which is incorporated herein by reference, disclose so called “image inpainting” to fill hole areas. Most of these algorithms are quite complex, because the overlaps and holes are usually irregular in shape and size.
To avoid the problems of holes and overlaps, a number of researchers have proposed to use so called bi-directional motion fields. Referring to FIG. 3, a prior-art bi-directional motion field approach is illustrated by means of the first video frame 110, second video frame 130, and the interpolated video frame 120. In the bi-directional motion field approach, the interpolated video frame 120 is created as an empty frame and is divided into the blocks 122A, B, C . . . , which are initially empty. For each of the blocks 122A, B, C . . . , two motion vectors are estimated, one corresponding to the shift of the blocks 122A, B, C . . . relative to the blocks 112A, B, C . . . of the first video frame 110, and the other corresponding to the shift of the blocks 122A, B, C . . . relative to the blocks 132A, B, C . . . of the second video frame 130. For example, for the block 122C, two motion vectors are estimated, 301C and 303C. The first motion vector 301C provides a magnitude and a direction of displacement of the corresponding block 112C of the first video frame 110 relative to the block 122C, and the second motion vector 303C provides a magnitude and a direction of displacement of the corresponding block 132C of the second video frame 130 relative to the block 122C. When all these vectors are known, the blocks 122A, B, C . . . can be filled with pixels from the blocks 112A, B, C . . . from the first video frame 110, shifted by corresponding motion vectors, as well as with pixels form the blocks 132A, B, C . . . from the second video frame 130, shifted by corresponding motion vectors. The pixel values transferred from the first video frame 110 and the second video frame 130 are then averaged using a weighted averaging method. The bi-directional motion field can be estimated using an algorithm disclosed by Choi et al. in an article entitled “New frame rate up-conversion using bi-directional motion estimation,” IEEE Trans. Consumer Electronics, vol. 46, No. 3, pp. 603-609, August 2000, which is incorporated herein by reference. Alternatively, the bi-directional motion field can be derived from a unidirectional motion field as disclosed by Castagno et al. in an article entitled “A method for motion adaptive frame rate up-conversion,” IEEE Trans. Circuits and Systems for Video Technology, vol. 6, No. 5, pp. 436-445, October 1996, which is incorporated herein by reference.
One apparent advantage of the bi-directional approach is that there is no need to handle holes and overlaps, since the blocks in the frame to be interpolated are contiguous. For the same reason, one can use overlapped block motion compensation (OBMC) disclosed by Kang et al. in an article entitled “Motion compensated frame rate up-conversion using extended bilateral motion estimation”, IEEE Trans. Consumer Electronics, vol. 53, No. 4, pp. 1759-1767, November 2007, which is incorporated herein by reference. The OBMC can be used to reduce blocking artifacts in the interpolated image 120 divided into the contiguous blocks 122A, B, C . . . .
However, there are two essential performance problems associated with the bi-directional approach. The first problem is the difficulty to accurately estimate the bi-directional motion since the pixels within the block to be interpolated are not known. The second problem is the lack of information on occlusions. Occluded areas exist in only one of the two existing video frames 110 and 130 and, therefore, should be interpolated using only one of the video frames 110 and 130.
The prior art is lacking a method of image interpolation that would provide fast and accurate image interpolation while reducing undesirable effects of holes, overlaps, and blocking artifacts in the interpolated image. Accordingly, it is an object of the invention to provide such a method.