The present invention relates to digital video signal processing, and more particularly to interlaced and progressive video formats.
For video systems, interlaced format is widely used to reduce data rate. FIG. 2A illustrates interlaced video format where t denotes time, x horizontal position, and y vertical position. An interlaced video signal consists of a sequence alternating between the two kinds of fields: the even-parity fields which contain the lines with even line numbers, i.e., y=even (“Top Field”), and the odd-parity fields which contain the lines with odd line numbers (“Bottom Field”). For example, NTSC interlaced television sets display 60 fields per second with each field containing 262-263 lines of which about 240 are active.
FIG. 2B shows a progressive video format frame, which recent high-quality TV sets use to display video programs. In progressive format, a frame which contains all of the line data, is displayed all at once. NTSC progressive TV sets display 60 progressive frames per second with about 720×480 active pixels per frame. However, most video programs are broadcast in interlaced format, and thus interlaced-to-progressive conversion (de-interlacing) is required to display a broadcast TV program on a progressive TV set. Essentially, skipped lines in the interlaced fields need to be filled in by interpolation to yield progressive frames.
Generally, de-interlacing methods can be classified into one of five categories: (1) spatial (intra-frame) techniques;(2) temporal (inter-frame) techniques; (3) spatial-temporal techniques; (4) motion-detection based techniques; and (5) motion-compensation based techniques.
The traditional spatial, temporal, and spatial-temporal interpolation schemes usually lead to poor conversion performance. The spatial interpolation does not utilize the full achievable vertical-temporal (V-T) bandwidth in filtering because it ignores the temporal spectrum, which reduces the vertical resolution. The temporal interpolation, however, causes artifacts such as jaggy and feather effects when motion is present. Although the spatial-temporal interpolation can fully utilize the V-T spectrum, it cannot handle motion scenes well.
Thus, motion adaptive techniques of categories (4)-(5) are generally advantageous. The most advanced de-interlacing techniques usually make use of motion estimation and compensation. Motion compensation allows virtual conversion of a moving sequence into a stationary one by interpolating along the motion trajectory. However, this type of technique is of much higher complexity in implementation.
On the other hand, interlaced sequences can be essentially perfectly reconstructed by temporal filtering in the absence of motion, while spatial filtering performs well in the presence of motion. Motion-detection based methods use a motion detector to take advantage of the above-mentioned facts by classifying each pixel as in either a moving or a stationary (still) region. Based on the output of the motion detector, the de-interlacing method then blends between the temporal filtered output and the spatial filtered output.
FIG. 3A is a block diagram of a generic interlaced-to-progressive converter employing motion detection. The converter converts input interlaced video source to progressive video format that contains the original interlaced lines plus interpolated lines. The frame buffer stores several interlaced fields. The motion detector detects moving objects in the input fields pixel-by-pixel. The motion detector calculates the level of motion at every pixel location where the pixel data needs to be interpolated. The more obvious the motion is, the higher motion level becomes. The still-pixel generator and the moving-pixel generator interpolate pixels by assuming, respectively, that the pixel being interpolated is a part of a still object or a moving object. The selector/blender block selects or blends the outputs of the still-pixel generator and the moving-pixel generator using the level of motion. When the motion level is low, the output of the still-pixel generator is selected or the blending fraction of the output of the still-pixel generator in the interpolated output data becomes high.
The still-pixel generator is realized by an inter-field interpolator. The moving-pixel interpolator consists of an intra-field interpolator plus an edge-direction detector. The edge-direction detector detects the edge direction (edge vector) at the detection pixel in the pattern of an object in the field and outputs the detected direction to the intra-field interpolator. The intra-field interpolator calculates a pixel value using the detected direction, by interpolating pixels along in the detected direction (edge vector). (Without a direction, the interpolator could simply interpolate using the two closest pixels in the field.) The spatial resolution of the inter-field interpolator is higher than that of the intra-field interpolator. But when an object which includes the pixel being generated by interpolation is moving, i.e., the motion level is high, the inter-field interpolator causes comb-like artifacts and the output of the intra-field interpolator is selected or the blending fraction of the output of the intra-field interpolator is set as high.
Spatial correlations between two lines are used to determine the edge direction (edge vector) and thus the direction of interpolation in typical methods. If a vertical correlation is found, it is interpreted as an edge along the direction of detected correlated regions. Interpolation is then performed along this detected edge to create smooth angular lines, as shown in FIG. 4A. If no correlations are found, a spatial-temporal median filter is usually employed to maintain crisp horizontal edges. Generally, edge-adaptive interpolation can be described asF(int)(j,i,n)=[F(j+1,i−k,n)+F(j−1,i+k,n)]/2where i is the column number, j is the line number, n is the field number, and k is the offset which provides the maximum correlation.
Generally, the orientation of an edge is determined by observing the results of several luminance correlations. If the difference between two neighboring pixels is large, it is assumed that there is an edge between them. Conversely, if the difference between two neighboring pixels is small, it is assumed that the two pixels lie on the same side of an edge. The sum of absolute differences (SAD) between two lines within the current field is usually used to denote the amount of correlation. More accurate metrics, such as squared differences, can also be used, but these need additional computation complexity.
It is still uncertain whether a zero difference between two neighboring pixels indicates the spatial direction in which the signal is stationary, or is a result of the alias edge detection or motion. A more complicated edge detection technique also checks the edge orientations at the neighboring pixels. However, these techniques usually have poor performance when the actual edge is beyond the search window. For example, as shown in FIG. 4B, if we use the window size of 2×3, (the small rectangle), we are unable to detect the correct edge. In this case, we have to increase our search window to at least 2×11 (the big rectangle) to cover the edge vector in the example.
But increased window size usually leads to significantly increased computation complexity, especially if we look for fractional-pixel resolution edge vectors, such as half-pixel or quarter-pixel resolution edge vectors. There are some proposed methods aimed to balance the computation complexity and accuracy of edge detection. One example is U.S. Pat. No. 6,731,342 in which a 2×15 observation is used to detect edges. Techniques based on gradient and rule are used to roughly detect the range of an edge and then region matching is employed to accurately calculate the edge. In the region matching stage, a sliding window is used to perform difference comparisons to denote the correlation level. The sliding window moves with ¼ pixel degree, thus resulting in high computation complexity.
When a bigger search window is used, additional protection is usually needed because it is more likely to yield ambiguous edge candidates. A weighted sum of interpolated pixel values based on different edge vectors can be used when multiple directions are found. Generally speaking, the weighting factor depends on statistics, which requires a training process. Uniform weighting factors can be used. Another solution can be found in U.S. Pat. No. 6,133,957, where ambiguity and confidence tests are employed to deal with ambiguous edge detection that results from using big search windows. However, this scheme does not provide fractional-pixel resolution edge vector detection.
In short, these known methods of spatial interpolation for de-interlacing systems have have high computational complexity or insufficient performance.