(1) Field of the Invention
The present invention relates to the field of video data de-interlacing, and in particular to a method and apparatus for reconstructing frames from a line-skipped-sequence of fields.
(2) Description of the Related Art
Interlaced scanning has been invented in the 1930s as a way to improve subjective picture quality of TV images without consuming any extra bandwidth. Since then, interlaced scanning is widely used in many television standards, including the 1080i HDTV broadcast standard. According to an interlaced scanning scheme, an image is divided into two fields that contain odd and even scan lines, respectively. Each field is scanned line by line, from top to bottom.
With interlaced scanning, both fields are captured, transmitted, and displayed subsequently. The afterglow of the phosphor of cathode ray tubes (CRTs), in combination with the persistence of vision results in the two fields being perceived as a continuous image which allows the viewing of full horizontal detail with half the bandwidth which would be required for a full progressive scan while maintaining the necessary CRT refresh rate to prevent flicker.
New display technologies including LCD and plasma displays, however, require video signals according to the progressive scanning standard. Progressive or noninterlaced scanning refers to a scanning method where lines of each frame are strictly drawn in sequence. With progressive scan, an image is captured, transmitted and displayed in a path similar to text on a page: line by line, from top to bottom.
In order to be able to display interlaced video on a progressive scan display device, it is thus necessary to convert the one into the other. Moreover, even converting interlaced video with one spatial resolution to interlaced video with another spatial resolution is generally a multi-step process including interlaced-to-progressive conversion, resolution conversion, and progressive-to-interlaced conversion steps. The process of converting interlaced video into progressive scan video is known as de-interlacing. Obviously, the resulting image quality depends critically on the method employed for de-interlacing.
With an ever-increasing number of alternative TV standards and the need to convert back and forth between any of these standards, high-quality de-interlacing algorithms have attracted a lot of attention recently. Consequently, there is a large variety of different de-interlacing algorithms known in the art, each of which has its specific advantages and draw-backs.
For a discussion of de-interlacing algorithms, two different sources for interlaced video material have to be discerned, namely footage generated by a traditional film camera and footage generated by an (interlaced) video camera. In interlaced video material generated via a telecine process from traditional motion-picture footage with 24 images per second, two consecutive fields are pulled-down from the same original image. Consequently, in case of interlaced video generated by a telecine process, the sequence of original images can be reconstructed faithfully by combining pairs of consecutive fields.
On the other hand, interlaced video cameras capture consecutive fields at different points of time. Unless the captured image is perfectly stationary, two consecutive fields will thus contain information about two different images that cannot be combined in a straightforward manner. This is the realm of state-of-the-art de-interlacing algorithms.
The very same problem as with interlaced footage from video cameras arises if the interlaced video sequence has been generated from a full-resolution progressive scan sequence by applying the so-called line-skipping operation, which is illustrated in FIG. 1a. Starting from a sequence of full-resolution images 110, the line-skipping operation may skip every other line in each frame 120. In order to ensure that stationary objects of a scene are uniformly sampled over time, odd and even lines of consecutive frames are skipped alternately. The lines kept by the line-skipping-operation thus form a sequence of fields 130, each field representing a down-sampled version of the corresponding frame.
More generally speaking, the line-skipping operation may keep only every Kth line of each frame and discard the other (K−1) lines; cf. FIG. 1b for K=3. To guarantee that non-moving objects of a scene are sampled uniformly over time, downsampling depends on the frame number, i.e. after downsampling K consecutive frames of a full-resolution sequence, all pixels of a frame are sampled once. The result of this line-skipping operation is a sequence, which has a much lower resolution in the vertical direction than in the horizontal direction. This means that frames of size R×C are reduced to size U×C, U=R/K. These fields have a 1/K lower resolution in the vertical direction than in the horizontal direction. It is to be noted that the line-skipping operation with K=2 results in a conventional interlaced sequence with two alternating fields.
The line-skipping operation does not apply any filtering before downsampling in the vertical direction. Consequently, the so-called “line-skipped-sequence” or field sequence is distorted by aliasing. It is obvious that the amount of aliasing in a downsampled signal depends on the signal itself and the downsampling factor K. The bigger the downsampling factor, the larger the overlap of the signal spectrum, the more aliasing artifacts are contained in the sequence.
The aliasing problem is further illustrated in FIG. 2, wherein FIG. 2a is a schematic drawing of the spectrum of the original (band-limited) signal without any aliasing. Converting this signal to an interlaced signal by applying the line-skipping operation with K=2, leads to an overlap of the spectra due to undersampling of the original signal; cf. FIG. 2b. This effect is known as aliasing and leads to overt artifacts such as Moiré patterns. More severe undersampling, for instance by applying the line-skipping operation with K=3 as indicated in FIG. 2c, leads to multiple overlapping spectra and even more pronounced artifacts.
The objective of de-interlacing algorithms is thus to reconstruct the missing lines of the line-skipped sequence, to reduce aliasing, and to increase the vertical resolution of the images. Existing de-interlacing algorithms can be roughly divided into three categories: linear filtering, adaptive (nonlinear) techniques, and motion-compensated algorithms.
De-interlacing algorithms based on linear filtering reconstruct missing pixel data by applying a linear filter that has support in the set of available pixel data, i.e., by substituting missing pixel data by a weighted sum of spatially and/or temporally adjacent pixels. These algorithms are generally easy to implement, however, are not effective for either reducing aliasing artifacts or increasing vertical resolution.
FIG. 3 illustrates by way of example a simple de-interlacing algorithm based on linear filtering. FIG. 3a illustrates the line-skipping operation performed on a full-resolution sequence 310 containing an object moving in the course of four frames from the lower-left corner to the upper right corner. Down-sampling with K=3 leads to line-skipped sequence 330 still representing the object moving from the lower-left to the upper-right corner.
In FIG. 3b, a field 321 of the line-skipped sequence 330 is up-sampled by substituting missing lines with respective lines of the previous and the next field; cf. the arrows in FIG. 3b. The thus reconstructed frame 311, however, suffers from severe artifacts, which are due to object motion in the scene.
In order to prevent motion-related artifacts in the de-interlaced sequence, conventional de-interlacing algorithms employ motion compensation techniques. These algorithms typically comprise two additional steps, namely estimating object motion from a sequence of interlaced fields and compensating the motion by shifting the image content accordingly, prior to merging or interpolating consecutive fields.
FIG. 4a illustrates a simple motion compensated de-interlacing algorithm by way of the example shown in FIG. 3a. In addition to the linear-filtering algorithm illustrated in FIG. 3b, the reference fields at time (n−1) and (n+1), i.e., the previous and the next field of the current field at time n, are shifted horizontally in order to compensate for the object movement. The shifted lines of the two reference fields are then employed to substitute the missing lines of the current field, resulting in a—in this example—perfect reconstruction 411 of the original frame.
Although this algorithm may decently handle motion along the horizontal direction, it will fail frequently for motion in the vertical direction, as illustrated in FIG. 4b for the frame at time (n+1), generating another type of disturbing artifact. It is to be noted that the vertical velocity of the moving object is the same in both cases, namely 1 full-resolution pixel per frame. The precise amount of vertical velocity, however, can only be determined from the full-resolution sequence and not from the interlaced sequence. It is further to be noted, that vertical motion compensation can only be performed in steps of the sub-sampling ratio K, i.e., 3, 6, 9, . . . pixels per frame, in this scheme.
From P. Delogne et al., “Improved Interpolation, Motion Estimation, and Compensation for Interlaced Pictures”, IEEE Trans. Image Processing, 3:482-491, 1994, an improved de-interlacing algorithm is known that is capable of exactly reconstructing the full-resolution sequence provided that motion vectors are known. According to a generalization of Shannon's sampling theorem, a bandwidth-limited signal with maximum frequency 1/T can be exactly reconstructed from N independent sets of samples, each sample set representing the same signal with a sampling frequency 2/(N T). The above mentioned algorithm exploits this fact by considering lines of a current frame and a previous frame as the two independent sets of samples (N=2), the phase shift between the two sets being determined by the distance of two scan lines and the motion vectors.
FIG. 5a illustrates the de-interlacing algorithm based on the generalized sampling theorem. Filled circles represent existing samples in the interlaced sequence, whereas open circles represent samples to be reconstructed for the current frame. Arrows indicate motion between field at time (n−1) and time n. Hence, at the current time n, two sets of samples are available, namely the regular interlace sample represented by filled circles and the motion-propagated sample of the previous field, represented by crosses in FIG. 5a. From these two sets of samples, the missing samples at the position of the open circles may be calculated.
The step of determining motion vectors is illustrated in FIG. 5b. An object 530 in a current field 520 has moved relative to a previous field 510 of the interlaced sequence. In order to determine the corresponding motion vector, each field is divided into a plurality of blocks, each block consisting of a plurality of pixels 501. A block 550 of the previous field is shifted and compared pixel by pixel to the content of the current field. As a measure for quantifying the match, the mean square error of individual pixel values is conventionally used. The shifted position that yields the best match with the current field is then used to define a motion vector 560 for this block.
For de-interlacing purposes, the motion vectors have to be determined at sub-pel resolution. The above described motion estimation algorithm is thus performed on up-scaled and interpolated versions of the original fields.
Obviously, the performance of this algorithm depends critically on the accuracy of the motion vectors. In a de-interlacing algorithm, however, motion vectors have to be estimated from the aliased low-resolution sequence. Due to this aliasing, motion estimation algorithms fail to estimate sub-pel motion, because a perfect match does not exist. Interpolated samples of two aliasing-distorted fields differ even if there is only translational motion without any noise and occlusion. There is only a perfect match for full-pel shifts. These full-pel shift motion vectors, however, are not of interest to the de-interlacing algorithm, because then the two sets of samples do not fulfill the independence requirement of the generalized sampling theorem, i.e. they contain no additional information. Since motion vectors estimated from interlace video sequences are inherently inaccurate, motion compensation artifacts are introduced that significantly degrade the quality of the reconstructed video sequences.
We are thus left with a chicken and egg situation: On the one hand, accurate motion vectors are required in order to properly de-interlace the signal. On the other hand, motion vectors can only be estimated accurately from a signal free of aliasing artifacts, i.e., the properly de-interlaced signal.