The recent availability of video frame grabbing hardware has allowed video signals to appear on a variety of new platforms, such as multi-media computers and video printing systems. These new platforms have generated a desire in the user to view and print video signals in a different mode than was originally intended, namely as still images. The interlaced video standard, while being sufficient for displaying moving pictures at satisfactory quality, is ineffective for displaying stills since only one half the information needed to display an image is acquired at a single time. As a result, video must be deinterlaced, i.e., converted to progressive video, before it can be viewed as a sequence of stills. The deinterlacing process, which refers to forming frames from fields, is complicated by the possibility that motion can change the scene contents from field to field. Relative motion between fields can be caused by movement of objects in the scene relative to the camera, or by camera changes such as pan, zoom and jitter, where the latter is rather common in hand-held consumer camcorders.
The prior art addresses the problem of deinterlacing an even (or an odd) field by estimating the missing odd (or even) lines. A well-known method is to merge the even and odd fields, i.e., to fill in the missing lines of the odd (even) field by the lines of the neighboring even (odd) field. This simple mechanism, causes spatial "judder" artifacts at those image regions that contain moving objects (objects that move within the time interval of two successive fields). Merging, however, provides the best spatial resolution at steady image regions. Another approach to deinterlacing is to concentrate on a single field only (e.g., the odd field) and interpolate the missing lines using spatial interpolation. A simple interpolation technique is vertical linear interpolation where an average of the available pixel values above and below the missing pixel is assigned to the missing pixel. This method may cause artifacts if the missing pixel is over an edge whose orientation is not vertical. To overcome these artifacts, an contour-sensitive spatial interpolation method is proposed in M. Isnardi, "Modeling the Television Process," Technical Report No. 515, Massachusetts Institute of Technology, 1986, pages 161 to 163. This method attempts to find the orientation of the image gradient at the missing pixel. Interpolation is then performed using image values that are along this orientation in order not to "cross an edge" and cause artifacts.
A method that is potentially more effective is a hybrid method where the deinterlacing process switches, on a pixel-by-pixel basis, between merging and spatial interpolation depending on the dynamics of the missing pixel, so that the advantages of merging in steady regions are fully maintained. A motion detection scheme should be used to classify the missing pixel as a "moving pixel" or "steady pixel".
In U.S. Pat. No. 4,472,732, issued Sep. 18, 1984, Bennett et al. disclose such a method that uses the pixel-by-pixel difference of neighboring fields with the same polarity (e.g., even fields) that follow and precede the field that will be deinterlaced (e.g., an odd field) to perform motion detection, and then switch between merging and vertical interpolation depending on the presence and absence of motion that is determined by thresholding the difference values. This particular approach may falsely detect "no motion" if the scene is such that the gray levels of the pixels being compared in the two neighboring fields are similar although there is motion in the scene. Such a situation may happen, for instance, in case of scenes that contain a small object 10 moving against a uniform background 12 in the direction of arrow A as shown in FIG. 1., where fields (k), (k+1), and (k+2) represent successive interlaced video fields. In this case, merging of the fields (k) and (k+1) at a region of interest denoted as the box 14, will result in artifacts due to a false classification of no motion between field (k) and (k+2). If a consecutive fourth field , field (k+3) in FIG. 1, is used in motion detection, a comparison of fields (k+1) and (k+3), in addition to the comparison of fields at times (k) and (k+2), may increase the reliability of motion detection. This is evident in the example shown in FIG. 1, where a "moving" decision can be rendered for the region of interest in the frame at time (k+1) as a result of comparing the corresponding image values at fields at times (k+1) and (k+3). Motion-detection based deinterlacing techniques that utilize four consecutive fields, and switch between spatial interpolation and merging, have been discussed in U.S. Pat. No. 4,785,351, issued to Ishikawa, Nov. 15, 1988, and in U.S. Pat. No. 5,021,870 issued to Motoe et al, Jun. 4, 1991.
These techniques that adapt themselves to the presence of motion are not effective in producing a high quality still image in case of video images that contain dominant motion between fields. In such cases, the above techniques will default to spatial interpolation only, and thus no additional improvement in resolution will be obtained. Video images with dominant motion result, for example, from the motion of hand-held cameras and/or cameras that are panned and zoomed. Since hand-held video cameras are becoming increasingly common in consumer applications, there is a growing interest in a deinterlacing method (e.g., to be used in generating good-quality prints from video) that improves the resolution via motion compensated temporal interpolation, using information contained in neighboring fields.
A motion-compensated deinterlacing technique that accounts for dominant motion between fields is discussed by Wang et al. in U.S. Pat. No. 5,134,480, issued Jul. 28, 1992. The technique proposed by Wang et al is a time-recursive method that performs motion estimation and compensation on a local basis (for each pixel) via block matching. Due to the time-recursive nature of the method, a history of the deinterlaced versions of the fields that precede the field of interest is utilized. A quad-tree hierarchy is used in adjusting the block size in order to increase the accuracy of local motion estimation. Deinterlacing is implemented by linearly blending the results of spatial vertical interpolation and motion compensated interpolation, where motion compensation is performed using either the future field following the field of interest or the recursively deinterlaced version of the previous field.
Local motion estimation for every pixel location is in general computationally expensive. Further, it is complicated by the existence of covered/uncovered regions (i.e., occlusions), and sharply varying motion vectors at boundaries of objects that move independent of each other. The challenge for robustness in the case of methods that estimate local motion, thus allowing for independent object motions, is not to create artifacts at motion boundaries. This problem is very difficult to solve in a robust manner without an excessive amount of processing, and is especially difficult in creating high-quality stills from video, since small artifacts can be extremely objectionable. When the motion is modeled globally, using models such as affine or perspective, the challenge is not to produce artifacts when the actual motion deviates from the model.