Streaming video over the Internet has become more popular, helped by the greater availability of on-line video content and the increased use of high-bandwidth connections with which to obtain the content. Providers of streaming video often use low frame rates (as well as small frame dimensions and low fidelity) to reduce bitrate and thus make viewing or downloading the video practicable, even for high-bandwidth connections. For example, streaming video often has a frame rate of 15 frames per second [“fps”] or slower. To viewers accustomed to television frame rates of 25 fps, 30 fps, or higher, the streaming video may appear jerky or choppy.
Outside of streaming video applications, it is sometimes necessary to convert video content from one frame rate to another for reasons unrelated to bandwidth limitations. Examples include converting from cinematic 24 fps content to the CCIR-601 video rates (telecine conversion), converting between PAL, NTSC and HDTV rates, and generating frames for slow motion playback.
Traditional methods of rate conversion have used frame or field repetition, such as the commonly used 3:2 pull-down method for telecine conversion. In these methods, the nearest source frame/field to the desired output time-stamp is displayed. For instance, in U.S. Pat. No. 5,929,902 to Kwok, a sequence of frames at 24 fps is converted to 60 fields per second video by producing three video fields for the first frame and two fields for the second frame, etc., with the fields alternating between odd and even frames. The first field produced could be an odd field from the first frame, the second an even field from the first frame, the third field an odd field from the first frame (identical to the first field), the fourth an even field from the second frame, the fifth an odd field from the second frame, and so on.
It is also possible to use simple temporal filtering to generate a new output frame at a correct time-stamp. This may suffice for low-motion video, but does not work as well if there is temporal aliasing of high spatial frequency components in the source sequence. Typically, low frame-rate video content contains plenty of temporal aliasing, and simple temporal filtering may produce obvious ghosting artifacts. Ghosting artifacts are an unintended result of blending two images. For example, when a foreground object and background at the same location in different frames are blended, a faint version of the foreground object may appear over the background where it should not. Such duplicate or out-of-place objects appear similar to those produced from the double exposure of still image film.
Motion compensated temporal filtering has been used to alleviate this problem. Matching regions from one source frame to another by motion estimation allows a new frame to be synthesized at an intermediate time by temporal filtering of the aligned and positioned source regions. Numerous techniques for motion compensated temporal filtering have been tried. The quality of the resulting frame is critically dependent on the accuracy of the motion estimation. For this reason, numerous approaches to motion estimation have been suggested. Whatever the merits of previous frame interpolation and motion analysis techniques, however, they do not have the advantages of the techniques and tools of the present invention.