The present invention relates to digital video signal processing, and more particularly to interlace-to-progressive conversion.
For moving picture systems, interlaced video format is widely used to reduce data rate. That is, each image frame consists of two fields, each of which contains samples of either the even numbered (top field) or the odd numbered (bottom field) lines of the image. In interlaced scan, fields are scanned and displayed sequentially, as shown for a 5×5 block of pixels in FIG. 5A. By taking advantage of the time it takes for an image to fade on a CRT, interlaced video gives the impression of double the actual refresh rate, helping to prevent flicker, which occurs when the monitor's CRT is driven at a low refresh rate, and allowing the screen's phosphors to lose their excitation between sweeps of the electron gun. Interlaced scan achieves a good tradeoff between frame rate and transmission bandwidth requirements. For example, NTSC interlaced television sets display 60 fields per second.
However, for a video display that can support a refresh rate high enough so that flicker is not perceivable, progressive scanning is preferable, since interlacing reduces the vertical display resolution and causes twitter effects for pictures with high vertical frequency. In progressive scan, all the frames, as a whole, are scanned and displayed continuously, as shown in FIG. 5B.
Due to the increased popularity of progressive scanning applications such as in digital TV (DTV) and displaying TV on PC monitors, there is a need to display video on a progressive display. Thus, the function of converting interlaced video to progressive video, so called “de-interlacing,” is strongly called for. The task for de-interlacing is to convert the interlaced frames into progressive frames, which represent the same image as the corresponding input field but contain the samples of the missing lines as well. This process is illustrated in FIG. 5C, where the dash lines show the missing lines in the interlaced video.
Mathematically, for a given interlaced input F(j,i,n), the output of de-interlacing, Fo(j,i,n), can be defined as
                                          F            o                    ⁡                      (                          j              ,              i              ,              n                        )                          =                  {                                                                                          F                    ⁡                                          (                                              j                        ,                        i                        ,                        n                                            )                                                        ,                                                                                                ⁢                                                            mod                      ⁡                                              (                                                  j                          ,                          2                                                )                                                              =                                          mod                      ⁡                                              (                                                  n                          ,                          2                                                )                                                                                                                                                                                                            F                      ^                                        ⁡                                          (                                              j                        ,                        i                        ,                        n                                            )                                                        ,                                                                                                ⁢                  otherwise                                                                                        (        1.1        )            where j, i, and n are the vertical, horizontal, and temporal index, respectively, the pixel Fo(j,i,n) is the result of de-interlacing, {circumflex over (F)}(j,i,n) is the estimation of the missing lines generated by the de-interlacing method, and F(j,i,n) is the pixel from the original interlaced field. The existing, even or odd, lines in the original fields are directly transferred to the output frame.
There are various ways to calculate the missing pixel {circumflex over (F)}(j,i,n). Generally, de-interlacing methods can be classified into five categories: (1) spatial (intra-frame) techniques; (2) temporal (inter-frame) techniques; (3) spatial-temporal techniques; (4) motion detection-based techniques, and (5) motion-compensated techniques.
The traditional spatial, temporal, and spatial-temporal interpolation schemes usually lead to poor conversion performance. The spatial interpolation does not fully utilize the achievable vertical-temporal (V-T) bandwidth in filtering because it ignores the temporal spectrum, which reduces the vertical resolution. The temporal interpolation, however, causes artifacts such as jaggy and feather effects when motion is present. Although the spatial-temporal interpolation can fully utilize the V-T spectrum, it cannot handle motion scenes well.
Thus motion adaptive techniques are generally advantageous. The most advanced de-interlacing techniques usually make use of motion estimation and compensation. Motion compensation allows virtual conversion of a moving sequence into a stationary one by interpolating along the motion trajectory. However, this type of technique is of much higher complexity in implementation. Another type of motion-adaptive de-interlacing schemes is based on motion detection. In the following, we use the terminology “motion-adaptive de-interlacing” to refer to motion-detection based de-interlacing. As is well-known, interlaced sequences can be essentially perfectly reconstructed by temporal filtering in the absence of motion, while spatial filtering performs well in the presence of motion. Motion-detection based methods use a motion detector to take advantage of these facts by classifying each pixel into moving and stationary regions. Based on the output of the motion detector, the de-interlacing method then fades between the temporal filtered output and the spatial filtered output. In addition to fading between spatial and temporal output, spatial-temporal filtering can be employed when motion is present but relatively insignificant.
In motion-adaptive de-interlacing, motion detection is a key component, as the de-interlacing performance for a video with motion is primarily determined by it. It can be generally presented as in equation (1.2), where is a real number between 0 and 1 representing a motion detection parameter that indicates the likelihood of motion for a given pixel.{circumflex over (F)}(x,y,n)=αFmot(x,y,n)+(1−α)Fstat(x,y,n)   (1.2)The pixels Fstat(x, y, n) are calculated for stationary regions using a temporal filtering, and the pixels Fmot(x, y, n) are calculated for moving regions using a spatial filtering. FIG. 5D illustrates the essential idea.
The general structure of a motion detector is shown in FIG. 5E. Motion image is created by taking the difference between the current field and the field of one frame ago. The motion image is first sent through a low-pass filter (LPF), followed by a rectifier. This filter filters out the noise, therefore reducing “nervousness” near edges in the event of timing jitter. Then another low pass filter is employed to improve the consistency of the motion detection relying on the assumption that objects are usually large compared to a pixel. This is usually a spatial or spatial-temporal maximum filter. Lastly, the nonlinear but monotonic transfer function translates the signal into, which indicates the likelihood of motion for each pixel. The larger amount of motion, the greater value of
Apart from the general structure of the motion detector, low pass filters are not necessarily linear. In addition, more than one detector can be used. Working on more than just two fields in the neighborhood of the current field with a logical or linear combination of their outputs may lead to a more reliable indication of motion. Next, we briefly describe the known motion detection techniques used for de-interlacing.
The goal of motion detection is to detect changes for each pixel in a video sequence from one field to the next. This task is straightforward for a progressive sequence by directly taking the frame difference. However, it is impossible to take the frame difference for interlaced sequences because pixels from consecutive fields do not have the same spatial locations. Thus consider some of the proposed solutions for motion detection in interlaced sequences.
Two-Field Motion Detection
Since motion detection is straightforward for progressive sequences, it would be an intuitive way to first convert the fields to frames, using any deinterlacing method. Any spatial de-interlacing method can be used for this purpose, such as line repetition, linear interpolation, or the edge-adaptive interpolation method. The idea is illustrated in FIG. 5F.
In FIG. 5F the pixel marked with ‘X’ is the currently processed pixel, which is first estimated by a spatial deinterlacing method. The absolute difference, which denotes the motion for the pixel currently being processed, is computed between the estimate and the pixel from the previous field. This pixel difference is represented in FIG. 5F by the double arrow symbol.
A more theoretical approach to this problem is to use a phase-correction filter applied spatially to obtain the estimate of the current pixel. One suggested phase-correction filter is a 6-tap filter with coefficients (3, −21, 147, 147, −21, 3) after quantization into 8-bit integers, as shown in FIG. 5F. After the phase correction filtering, the obtained estimate is compared with the pixel of the previous field to determine the corresponding motion. Please note that here the phase-correction filtering actually plays a dual role, i.e., it is used for both motion detection/-compensation and interpolation for de-interlacing, which makes the overall method computationally efficient. This method needs only two fields storage plus seven line buffers. The problem arising from two-field motion detection is that, without perfect reconstruction, there will be some errors inherent in the interpolation step. These errors can make vertical details in the picture appear as moving even if it is static. For instance, the pixels above and below pixel ‘X’ in FIG. 5F do not necessarily have any correlation to the actual value of pixel ‘X’. Thus, the interpolation may be done incorrectly and the resultant absolute difference between the interpolated pixel and its corresponding pixel in the previous field will not be an appropriate measure of motion. This error leads to a significant number of false-detections. As a result, more regions of the video sequence will be de-interlaced spatially than necessary, resulting in lower quality.
Three-Field Motion Detection
The drawback of two-field motion detection is due to the interpolation, which is not accurate. Three-field motion detection is, therefore, the simplest way to overcome such drawback by only comparing pixels on identical scan lines.
As shown in FIG. 5G, this technique is similar to FIG. 5F, except that no interpolation is needed. In a three-field motion detection scheme, the pixel in the previous field is compared to the pixel at the same position in the next field. Since only pixels on the scan lines are calculated, the absolute difference can be computed without any interpolation. An absolute difference of these two pixels is computed as an indication of the likelihood of motion of the current pixel that is being processed.
While two-field motion detection results in many false positives (detection of motion when none is present), three-field motion detection results in many false negatives (failure to detect when motion is actually present) because it is unable to detect fast-moving regions. The artifacts caused by missed detection are similar to those caused by field repetition.
Four-Field Motion Detection
The mis-detection caused by two- and three-field methods is the insufficient detection of information due to only one set of comparisons. Four-field motion detector improves upon them by comparing three sets of pixels rather than only one set of pixels in the previous two methods. The additional two pixel differences help to protect against errors in static edge region caused by two-field motion detection and errors in fast motion region caused by three-field motion detection scheme. The operation is illustrated in FIG. 5 H.
Three absolute differences are computed as shown in FIG. 5H. This results in a better decision of fast motion. Only pixels on corresponding scan lines are compared, thus no interpolation is needed. The output of this motion detector is the maximum of the three pixel differences.A=|F(x,y,n−1)−F(x,y,n+1)|B=|F(x,y−1,n)−F(x,y−1,n−2)|C=|F(x,y+1,n)−F(x,y+1,n−2)|Pd=max(A,B,C)   (2.1)While this method has better performance than three-field motion detection, it can still miss detections. The error usually happens when the motion detector designates a moving region as stationary. One example is when both the two odd neighboring fields and the two even fields are very similar to each other, while at the same time, the odd fields are quite different from the even fields. This will result in no motion detected, but the frame actually changes a lot.
Five-Field Motion Detection
The Grand Alliance, developers of the United States HDTV standard, suggested a five-field based motion detection method for HDTV format conversion. This method improves upon four-field motion detection by reducing the number of missed motion detections, but the tradeoff is that one more field storage is needed. The proposed method is illustrated in FIG. 5I.A=|F(x,y,n−1)−F(x,y,n+1)|B=|F(x,y−1,n)−F(x,y−1,n−2)|C=|F(x,y+1,n)−F(x,y+1,n−2)|D=|F(x,y−1,n)−F(x,y−1,n+2)|E=|F(x,y+1,n)−F(x,y+1,n+2)|And the motion detector outputs the maximum of the following combination of these five pixel differences as the indicator of the likelihood of motion.
                              P          d                =                  max          ⁢                      {                          A              ,                                                B                  ⊣                  C                                2                            ,                                                D                  ⊣                  E                                2                                      }                                              (        2.2        )            In this maximum operation, the two pixel differences between the two even and odd fields are averaged. Therefore, no interpolation is needed in this method. Compared with the previous three methods, five fields are involved. Therefore, the advantage is that more motion coverage is used to improve the motion consistency.
This method has yet another advantage. The previous methods, except for the three-field motion detector, all detect motion in one direction of time. Particularly, they are all structured to detect differences between the current frame and the previous frames. Therefore, those methods all use the previous field for temporal de-interlacing. Nonetheless, the five-field method, to some extent, looks for motion in both the forward and backward directions. It is not immediately clear whether the previous field or the next field should be used for temporal interpolation in the motion adaptive deinterlacing method. Under that circumstance, as suggested by the Grand Alliance, a 4-tap VT median filter is thus used for temporal deinterlacing together with the five-field motion detection.
While this method does have the best performance of all the four methods discussed, it is still possible that some combination of frames can cause the motion detection method to miss areas of very fast and periodic motion. For example, a sequence of fields with some region that flickers between two images once every field over a five-field duration will cause such problems. These types of motion detection methods can only compare even fields with even fields or odd fields with odd fields. Consequently, the flickers at such frame rate cannot possibly be detected. The only type of motion detection that could detect this kind of flickers is a two-field based scheme, which has poor performance for the reasons mentioned previously in this section. A motion detection scheme similar to the current one but using seven fields or nine fields could be implemented to help improve the performance. The problem is that eventually it becomes unreasonable to look so far into the future or past. Additionally, the storage size requirement makes it impractical to do that.
Hybrid Methods
While motion adaptive methods do perform very well in general, it is still possible for a combination of frames to cause the motion detector's failure. Hybrid methods which combine motion detection with V-T median filter are proposed to improve this. A proposed seven-point median filter is defined by
                                          F            ^                    ⁡                      (                          x              ,              y              ,              n                        )                          =                  median          ⁡                      (                          A              ,              B              ,              C              ,              D              ,              E              ,              F              ,                              α                ⁢                                                                  ⁢                                  F                  ⁡                                      (                                          x                      ,                      y                      ,                                              n                        -                        1                                                              )                                                              ,                              β                ⁢                                                      B                    ⊣                    E                                    2                                                      )                                              (        2.3        )            where α (unrelated to α of (1.2)) and β are the integer weights: αA indicates the number of A's that occur in the median list in (2.3); for example, 3A means A, A, A. A large value of α increases the probability of field repetition, whereas a large β increases the probability of line averaging at the output. And the definitions for pixels A, B, C, D, E, F, G and H are presented in FIG. 5J. The motion detector controls the weight of these individual pixels at the input of the median filter.
In another method, a three-level hierarchical motion detector classifies motion into three categories, static, slow and fast motion. Based on this classification, one of the three defined interpolators is selected. In the case of static images, a temporal FIR filter is used; in the case of slow motion, a so-called weighted hybrid median filter (WHMF) is selected as the interpolator; and in the case of fast motion, a spatial FIR filter is employed. The reconstructed pixel is therefore described by
                                          F            ^                    ⁡                      (                          x              ,              y              ,              n                        )                          =                  {                                          ⁢                                                                                          (                                                                  F                        ⁢                                                  (                                                      x                            ,                            y                            ,                                                          n                              -                              1                                                                                )                                                                    +                                              F                        ⁡                                                  (                                                      x                            ,                            y                            ,                                                          n                              +                              1                                                                                )                                                                                      )                                    /                  2                                                                              (                  static                  )                                                                                                      median                  ⁢                                      {                                                                                                                        α                            0                                                    ⁡                                                      (                                                          A                              +                              F                                                        )                                                                          /                        2                                            ,                                                                                                    α                            1                                                    ⁡                                                      (                                                          B                              +                              E                                                        )                                                                          /                        2                                            ,                                                                                                                    (                  slowmotion                  )                                                                                                                                                                                                  α                          2                                                ⁡                                                  (                                                      C                            +                            D                                                    )                                                                    /                      2                                        ,                                                                                            α                          3                                                ⁡                                                  (                                                      G                            +                            H                                                    )                                                                    /                      2                                                        }                                                                                                                                                                                                                              α                      0                                        ⁢                    B                                    +                                                            α                      1                                        ⁢                    E                                    +                                                            α                      2                                        ⁢                    G                                    +                                                            α                      3                                        ⁢                    H                                                                                                (                  fastmotion                  )                                                              ⁢                                                                    (        2.4        )            The coefficients are calculated according to Webers law (“the eye is more sensitive to small luminance difference in dark areas than in bright areas”) as below
                                          β            0                    =                                    A              ⊣              F                                                                    A                -                F                                                                  ,                                  ⁢                              β            1                    =                                    B              ⊣              E                                                                    B                -                E                                                                  ,                                  ⁢                                  ⁢                              β            2                    =                                    C              ⊣              D                                                                    C                -                D                                                                  ,                                  ⁢                              β            3                    =                                    G              ⊣              H                                                                    G                -                H                                                                                      (        2.5        )            and αi=2, assuming βi is the minimum among the four β's and αj=1∀≠i.As discussed above, the techniques of motion detection is a key component in de-interlacing system and each technique above has its weakness.