This invention relates to methods and systems capable of processing interlaced television fields to convert to progressively scanned format with effective increase in vertical resolution and reduction of artifacts in image sequences.
Interlaced scanning is an efficient method of bandwidth compression for television transmission. However, in addition to the loss of vertical resolution, interlacing results in many well known artifacts. With frame memories becoming widely available, improved definition television (IDTV) can display non-interlaced video by appropriate processing at the receiver without changing the scanning structure of the current television system, such as the NTSC (National Television Systems Committee) system used in the United States. Proper conversion from interlaced to progressive format reduces line flicker and improves the vertical resolution of the displayed images. Such techniques are also important in high definition television (HDTV), since many of the current HDTV proposals either involve transmission of interlaced video, or high spatial frequency information at a reduced temporal rate.
FIG. 1 shows the three-dimensional (3-D) domain of a sampled image sequence x (m,n,t), in which the missing lines are in dotted form. In FIG. 2, the vertical-temporal grid (for a constant value of n) is shown, with circles on the scan lines of an interlaced sequence and crosses on the missing lines.
Interlaced scanning introduces aliasing, unless the moving image is properly prefiltered with a low-pass filter. The 3-D frequency responses of three possible such filters are shown in FIG. 3, as a direct consequence of the shape of the quincunx sampling grid (A, B, C, D and X) in FIG. 2. Different reconstruction methods are appropriate for each form of prefiltering. In actual image capturing devices, temporal filtering is due to camera time integration and is performed independent of spatial filtering, resulting in separable prefiltering. If vertical filtering is not strong enough to fully cover the area between two successive field lines, there is indeed some information missing between the field lines, which can be estimated using appropriate models for the image sequence source. From psychophysical experiments, applicants have found that the apparent quality of interlaced video is often the best if it comes from progressive video by dropping half the lines without prefiltering. In that case, deinterlacing only consists of predicting the missing lines, because each existing line remains as is.
For discussion, it is assumed that, preceding interlaced scanning, separable vertical-temporal filtering is done as in the dotted part (external cube) of FIG. 3, i.e. as if the pictures were progressively scanned, with a frame rate equal to the interlaced field rate. Equivalently, such interlaced sequences may be produced by properly dropping half the lines from a progressively scanned sequence. Applicants' experiments show that the resulting aliasing, which gives rise to various interlacing artifacts, is reduced by nonlinear interpolation of the missing lines. This resolution enhancement is a consequence of the fact that the sampling theorem can be "defeated" (e.g. some high frequency information between the solid and dotted volumes of FIG. 3 can be predicted) if there is extra knowledge of the signal source. The source model used for this purpose assumes that the video scene contains a number of rigid objects moving in a translational manner. The same concepts can be used to achieve purely spatial or temporal resolution enhancement. Nonlinear interpolation always reduces the energy of the enhancement signal, defined as the difference between the deinterlaced and the actual signals. Some simple temporal models assume that the image scene contains rigid moving objects, and that the components of their displacement vectors from frame to frame are not necessarily integer multiples of the spatial sampling period. This implies that the same object is sampled in many frames. Considering the overlap of all these sampling grids on a specific object, it is obvious that the "virtual" spatial sampling rate for the object is much denser than the one in any individual frame. Assuming, for the moment, that the image sequence is sampled without any antialiasing prefiltering, the above fact implies that one can derive, at least in theory, a high-definition image sequence from the samples of the many frames in which it was captured. If there is a spatial antialiasing prefiltering, then the blurred frame can be reconstructed from the samples of one frame only, and no additional information is conveyed from the other frames; if there is temporal antialiasing prefiltering, then moving objects will also be blurred. According to perceptual psychology, the human eyes tend to track moving objects. If ideal temporal prefiltering is applied, blurred moving objects will reduce image quality perceptually. This argument shows that it is desirable to apply weaker antialiasing prefiltering schemes, in video capture, than theoretically needed, even for progressively scanned scenes, or to make the amount of such prefiltering dependent on motion in a pixel-by-pixel adaptive manner.
In traditional deinterlacing schemes there are various simple ways to predict the pixel X of a missing line, and various local, motion adaptive, or motion compensated techniques have been proposed. In FIG. 2, A, B, D and D are the pixels above below, behind and in front of pixel X. Intraframe techniques can use line doubling, i.e. X=A, linear interpolation i.e. ##EQU1## or more complicated nonlinear spatial interpolation techniques. Such techniques cannot predict information which is lost from the subsampled current field, but appears in neighboring fields and various artifacts may result.
Interframe interpolation can be as simple as X=C, or ##EQU2## This is fine for stationary objects in a frame, if there is no motion. The nonlinear median filter X=Med(A,B,C) usually results in acceptable visual appearance, and is used with success in digital receivers. Even though artifacts still appear, there is some overall obvious improvement over interlaced displays.
If there is motion, then some form of motion compensation must be used. All techniques based on symmetrical matching search for a pair of 2-D blocks centered around the pixel to be predicted, assume constant velocity and cannot provide for acceleration. This does not result in serious visual artifacts when a whole frame is predicted based on the two neighboring frames. However, unacceptable results are found in the problem of deinterlacing, if there is accelerating motion of the objects in the scene.
It is therefore an object of this invention to provide improved methods and systems for converting interlaced scanned data to progressive scanned format and which overcome one or more shortcomings of such prior approaches.
It is a further object of the invention to provide such methods and systems utilizing time - recursive motion compensation and the next future field of data to derive missing pixel values on a best matching basis thus utilizing the history of image values to the present time.
It is a further object of the invention to provide such systems including provision for attenuating noise propagation by proportional inclusion of interpolated current field data in output frame data.
It is a further object of the invention to provide such systems wherein memory is efficiently utilized through data comparison based on blocks of pixels, with the size of such blocks being decreased for greater accuracy where appropriate.
It is also an object of the invention to provide such systems which may be adapted for color signal processing of both luminance and chrominance values and may also or alternatively utilize reduced data rate techniques for greater accuracy or efficiency.
It is an additional object of the invention to provide block matching and other subsystems useful in deinterlacing systems.