The present invention relates to digital video signal processing, and more particularly to interlace-to-progressive conversion.
For moving picture systems, interlaced video format is widely used to reduce data rate. The interlaced video format is illustrated in FIG. 15a, where t is the time, x horizontal position, and y vertical position. A video signal consists of a sequence of two kinds of fields. One of the two fields contains only lines with even line numbers, i.e., y=even (“Top Field”), and the other field contains those with odd line numbers (“Bottom Field”). For example, NTSC interlaced television sets display 60 fields per second.
FIG. 15b shows a progressive video format, which recent high-quality TV sets use to display video programs. In the progressive format, a frame which contains all of the line data (progressive frame) is displayed at a time. NTSC progressive TV sets display 60 progressive frames per second. Most video programs are broadcast in interlaced format. Then interlace-to-progressive (IP) conversion is required to display the TV program on the progressive TV, i.e., skipped lines in interlaced fields need to be filled in by interpolation.
Interlace-to-progressive conversion appears in various systems as illustrated in FIGS. 16a-16c; and FIG. 14a illustrates an interlace-to-progressive converter. The systems may have digital or analog video input or both, which may be encoded. The system may have digital or analog video output or both. For the system of FIG. 16a the analog input is A/D (analog to digital) converted to get a digital input signal. The digital video input signal is stored in a memory. The stored video signal is sent to the IP converter block. An OSD (on-screen display) block may be placed before or after the IP converter, or the OSD block may be omitted. The input signal may also sent to the IP converter directly. In the IP converter the input signal is digitally processed by a digital circuit or a processor with program code. The digital video output of the IP converter is sent to the outside directly or output as analog video out through a D/A (digital to analog) converter.
The system of FIG. 16b has an encoded video stream (e.g., MPEG2, MPEG4, H.264, et cetera) input which is decoded by the decoder. The decoded video signal is stored in a memory. The stored video signal is sent to the IP converter block. Again, an OSD block may be placed before or after the IP converter, or the OSD block may be omitted. The decoded video data may also sent to the IP converter directly. In the IP converter the input signal is digitally processed by a digital circuit or a processor with program code. The digital video output of the IP converter is sent to the outside directly or output as analog video out through D/A converter.
The system of FIG. 16c has input from any of encoded video, digital, video or analog video. The system has any of digital video, analog video, or encoded video output. The system has both decoder and encoder (e.g., decoder and encoder for MPEG2, MPEG4, H.264, et cetera). The encoded video stream is decoded by the decoder. The decoded video signal is stored in a memory. As in the other systems, the analog input is A/D converted to get digital input signal. The digital input signal is also stored in a memory. The encoder encodes the input video signal and outputs encoded video stream. The stored video signal is sent to the IP converter block. The decoded video data or digital video input signal may also sent to the IP converter directly. An OSD block may be placed before or after the IP converter, or the OSD block may be omitted. In the IP converter the input signal is digitally processed by a digital circuit or a processor with program code. The digital video output of the IP converter is sent to the outside directly or output as analog video out through a D/A converter.
FIG. 14a is a block diagram of a generic motion adaptive IP converter. The converter converts input interlaced video source to progressive video format that contains the original interlaced lines plus interpolated lines. The frame buffer stores several interlaced fields. The motion detector detects moving objects in the input fields pixel-by-pixel. The detector calculates the amount of motion (Motion_level) at every pixel location where the pixel data needs to be interpolated. The more obvious the motion is, the higher Motion_level becomes. The still-pixel and moving-pixel generators interpolate pixels by assuming the pixel being interpolated is a part of a still and moving object, respectively. The selector/blender block selects or blends the outputs of the still-pixel and moving-pixel generators using Motion_level. When Motion_level is low, the output of the still-pixel generator is selected or the blending fraction of the output of the still-pixel generator in the interpolated output data becomes high. The still-pixel generator is realized by the inter-field interpolator. The moving-pixel interpolator consists of the intra-field interpolator and the edge-direction detector. The edge-direction detector detects the direction at the detection pixel in the pattern of an object in the field and outputs the detected direction to the intra-field interpolator. The intra-field interpolator calculates a pixel value using the detected direction, by interpolating pixels along in the detected direction. (Without a direction, the interpolator could simply interpolate using the two closest pixels in the field.) The spatial resolution of the inter-field interpolator is higher than that of the intra-field interpolator. But when an object which includes the pixel being generated by interpolation is moving, i.e., Motion_level is high, the inter-field interpolator causes comb-like artifacts and the output of the intra-field interpolator is selected or the blending fraction of the output of the intra-field interpolator is set as high.
FIG. 14b shows a schematic diagram of an edge-direction detector. The edge-direction detector consists of the “directional index generator” and the “direction determiner”. The directional index generator generates index values for various edge angles. The direction determiner chooses the most probable direction using the output of the directional index generator. The directional index generator consists of many index generators: one for the edge angle #0, one for the edge angle #1, and so forth.
FIG. 19a shows an example flow of a traditional motion detection method for interlace-to-progressive up-conversion processing. First, let field#1 be a target field to be up-converted to progressive format, and define two input pixel (luminance) data arrays in the two neighbor opposite fields#0 and #2, in_y[0][x][y] and in_y[2][x][y] where x=0, . . . , IMAGE_SIZE_X−1; y=0, 2, 4, . . . , IMAGE_SIZE_Y−2 (top field) or y=1, 3, 5, . . . , IMAGE_SIZE_Y−1 (bottom field). To each pixel at (x,y) in the target field#1, two two-dimensional M×N pixel arrays, mot_y[0][i][i] and mot_y[2][i][j] (i=0, . . . , M−1; j=0, . . . , N−1), around the pixel are extracted from the two neighbor fields of the target field as follows:mot—y[k][i][j]=in—y[k][x+i−(M−1)/2][y+2j−(N−1)](k=0, 2; i=0, 1, . . . , M−1; j=0, 1, . . . , N−1)
Second, compute the M×N array of absolute values of differences, abs_diff[i][j], from the two extracted M×N pixel arrays, mot_y[0][i][j] and mot_y[2][i][j], as follows:abs_diff[i][j]=1|mot—y[0][i][j]−mot—y[2][i][j]|(i=0, 1, . . . , M−1; j=0, 1, . . . , N−1)
Third, each resultant absolute difference value in M×N array abs_diff[i][j] is multiplied by a weight factor weight_factor[i][j] at each pixel with the weight factor based on spatial distance from the target pixel:weight_diff[i][j]=abs_diff[i][j]×weight_factor[i][j](i=0, . . . , M−1, j=0, . . . , N−1)Note that typically the sum of the weight factors is taken to equal 1 and thereby normalize for the size of the M×N array.
Finally, the elements of the weighted M×N array are added together and a representative value sum_diff_area[x][y] is output as a degree of motion of the target pixel for motion detection processing:
      sum_diff    ⁢                  _area        ⁡                  [          x          ]                    ⁡              [        y        ]              =            ∑              i        =        0            M        ⁢                  ∑                  j          =          0                N            ⁢                        weight_diff          ⁡                      [            i            ]                          ⁡                  [          j          ]                    (x=0, . . . , IMAGE_SIZE_X−1, y=0,2,4, . . . , IMAGE_SIZE_Y−2) or (x=0, . . . , IMAGE_SIZE—X−1, y=1,3,5, . . . , IMAGE_SIZE—Y−1)
The value sum_diff_area[x][y] is used in a decision circuit for interpolation processing. If the value sum_diff_area[x][y] is low, the state of the target pixel at (x,y) is taken to be “still” and the pixel y′[x][y] is produced using temporal interpolation of the two pixels at the same position as the target pixel at (x,y) in field#0 and field#2. For example,
                              y          ′                ⁡                  [          1          ]                    ⁡              [        x        ]              ⁡          [      y      ]        =                                          in_y            ⁡                          [              0              ]                                ⁡                      [            x            ]                          ⁡                  [          y          ]                    +                                    in_y            ⁡                          [              2              ]                                ⁡                      [            x            ]                          ⁡                  [          y          ]                      2  And if the value sum_diff_area[x][y] is high, the state of the target pixel at (x,y) is estimated as “moving”, and the pixel is interpolated spatially using only the neighbor pixels in the same field. For example,
                              y          ′                ⁡                  [          1          ]                    ⁡              [        x        ]              ⁡          [      y      ]        =                                          in_y            ⁡                          [              1              ]                                ⁡                      [            x            ]                          ⁡                  [                      y            -            1                    ]                    +                                    in_y            ⁡                          [              1              ]                                ⁡                      [            x            ]                          ⁡                  [                      y            +            1                    ]                      2  For pixel values in the range of 0 to 255, a typical decision would be sum_diff_area[x][y]≦50 indicates the state of (x,y) is still and sum_diff_area[x][y]>50 indicates the state of (x,y) is moving.
This basic strategy for interpolation method decision leads to a progressive format image prog_y[x][y] as follows:
                    prog_y        ⁡                  [          1          ]                    ⁡              [        x        ]              ⁡          [      y      ]        =      {                                                                                                            in_y                    ⁡                                          [                      1                      ]                                                        ⁡                                      [                    x                    ]                                                  ⁡                                  [                  y                  ]                                                                                    (                                                      y                    =                    0                                    ,                  2                  ,                  4                  ,                  …                  ⁢                                                                          ,                                                            IMAGE_SIZE                      ⁢                      _Y                                        -                    2                                                  )                                                                                                                                                    y                      ′                                        ⁡                                          [                      1                      ]                                                        ⁡                                      [                    x                    ]                                                  ⁡                                  [                  y                  ]                                                                                    (                                                      y                    =                    1                                    ,                  3                  ,                  5                  ,                  …                  ⁢                                                                          ,                                                            IMAGE_SIZE                      ⁢                      _Y                                        -                    1                                                  )                                                    ⁢                                  ⁢                                  ⁢        or        ⁢                                  ⁢                                            prog_y              ⁡                              [                1                ]                                      ⁡                          [              x              ]                                ⁡                      [            y            ]                              =              {                                                                                                                        in_y                      ⁡                                              [                        1                        ]                                                              ⁡                                          [                      x                      ]                                                        ⁡                                      [                    y                    ]                                                                                                (                                                            y                      =                      1                                        ,                    3                    ,                    5                    ,                    …                    ⁢                                                                                  ,                                                                  IMAGE_SIZE                        ⁢                        _Y                                            -                      1                                                        )                                                                                                                                                                        y                        ′                                            ⁡                                              [                        1                        ]                                                              ⁡                                          [                      x                      ]                                                        ⁡                                      [                    y                    ]                                                                                                (                                                            y                      =                      0                                        ,                    2                    ,                    4                    ,                    …                    ⁢                                                                                  ,                                                                  IMAGE_SIZE                        ⁢                        _Y                                            -                      2                                                        )                                                              ⁢                                          ⁢                                          ⁢                      (                                          x                =                0                            ,              …              ⁢                                                          ,                                                IMAGE_SIZE                  ⁢                  _X                                -                1                                      )                              This traditional motion detection scheme has the following weak points:                1) A simple expansion of this method for more accurate motion detection consumes memory work space for field buffering, its access bandwidth on the memory bus and computation load for calculating the motion degree is proportional to the number of fields to be stored.        2) It is difficult to detect especially fast motion and periodic motion in interlaced image sequences.        
FIG. 19b shows a typical interlaced image sequence which causes motion detection loss as described in 2). In this example sequence, a rectangular box is moving at a constant speed from left side to right side. Let the value of white pixels in all fields be 255, and that of black pixels be 0. And let the size of the extracted arrays be M=5, N=3. Under these conditions, the degree of motion of a pixel (x0,y0) at the center of target field#1 using above traditional detection method is considered. First, the two two-dimensional M×N pixel arrays mot_y[0][i][j] and mot_y[2][i][j] (i=0, . . . , 4; j=0, 1, 2) are extracted as follows:
                    mot_y        ⁡                  [          0          ]                    ⁡              [        i        ]              ⁡          [      j      ]        =      {                                                      255                                                      (                                                      i                    =                    0                                    ,                  4                                )                                                                        0                                                      (                otherwise                )                                                    ⁢                  (                                    j              =              0                        ,            …            ⁢                                                  ,            2                    )                ⁢                                  ⁢                                            mot_y              ⁡                              [                2                ]                                      ⁡                          [              i              ]                                ⁡                      [            j            ]                              =              {                                                            255                                                              (                                                            i                      =                      0                                        ,                    4                                    )                                                                                    0                                                              (                  otherwise                  )                                                              ⁢                      (                                          j                =                0                            ,              …              ⁢                                                          ,              2                        )                              Note that left side of the rectangular box in field#0 and right side of the rectangular box in field#2 overlap each other at this target pixel (x0,y0). Therefore, all absolute values of difference between two sets of the extracted M×N pixel array abs_diff[i][j], are zero:abs_diff[i][j]=|mot—y[0][i][j]−mot—y[2][i][j]|=0 (i=0, . . . , 4, j=0, . . . , 2)Hence, the weighted absolute difference value weight_diff[i][j] and the sum of difference value sum_diff_area[x0][y0] also result in all zeroes:
                    weight_diff        ⁡                  [          i          ]                    ⁡              [        j        ]              =                                        abs_diff            ⁡                          [              i              ]                                ⁡                      [            j            ]                          ×                              weight_factor            ⁡                          [              i              ]                                ⁡                      [            j            ]                              =      0            (                  i        =        0            ,      …      ⁢                          ,      4      ,              j        =        0            ,      …      ⁢                          ,      2        )              sum_diff      ⁢                        _area          ⁡                      [                          x              ⁢                                                          ⁢              0                        ]                          ⁡                  [                      y            ⁢                                                  ⁢            0                    ]                      =                            ∑                      i            =            0                    4                ⁢                              ∑                          j              =              0                        2                    ⁢                                    weight_diff              ⁡                              [                i                ]                                      ⁡                          [              j              ]                                          =      0      As a result that the degree of motion at (x0,y0), sum_diff_area[x0][y0], is zero, and the target pixel is treated as “still” and interpolated using temporal interpolation as follows:
                              y          ′                ⁡                  [          1          ]                    ⁡              [                  x          ⁢                                          ⁢          0                ]              ⁡          [              y        ⁢                                  ⁢        0            ]        =                                                        in_y              ⁡                              [                0                ]                                      ⁡                          [                              x                ⁢                                                                  ⁢                0                            ]                                ⁡                      [                          y              ⁢                                                          ⁢              0                        ]                          +                                            in_y              ⁡                              [                2                ]                                      ⁡                          [                              x                ⁢                                                                  ⁢                0                            ]                                ⁡                      [                          y              ⁢                                                          ⁢              0                        ]                              2        =                            255          +          255                2            =      255      Thus, in this example sequence, the interlace-to-progressive processing with the traditional motion detection method causes noticeable noise at the center of the up-converted progressive image.
Theater movies (film) provide 24 progressive format frames per second, and the 3-2 pulldown process illustrated in FIG. 18a converts such a sequence of progressive format frames into a sequence of interlaced format fields with a field rate of 60 fields per second. Note that some fields are repeated, and that the overall process is periodic with a period of four progressive frames and ten fields (five top fields including one repetition and five bottom fields with one repetition). And precise detection of whether a received interlace format field sequence was originally generated as a 3-2 pulldown sequence can have great beneficial effect in the quality of several kinds of subsequent video processing of the field sequence, including IP conversion and video compression.
Note that 3-2 pulldown sequences have a repeated field every five fields; therefore 3-2 pulldown sequences can be detected by detecting this regular pattern (called a 3-2 pattern hereinafter). There are, however, several difficulties in accurate detection of the 3-2 pattern in actual environments:
Difficulty #1: Some sequences may have small parts of image segments with different formats within them. A typical example is a 3-2 pulldown film material that is overlaid by a 60-interlace-fields-per-second telop for subtitles. Thus the major portion of each of the images is 3-2 pulldown, so very small field differences appear once every five fields. Handling them as a 3-2 pulldown sequence, however, will result in undesirable quality, especially at the part with the different format.
Difficulty #2: The 3-2 pattern becomes unclear when a 3-2 pulldown sequence is compressed by a lossy compression method such as MPEG for transmission and/or storage. Transmission noise is also the source of noise if the transmission is performed in analog format.
Difficulty #3: The 3-2 pattern disappears in case of still sequences.
Difficulty #4: The 3-2 pattern becomes subtle if the area of moving objects is relatively small compared to the whole image area, (e.g., ˜1% of an image is moving objects).
FIG. 18b shows the block diagram of a typical 3-2 pulldown detector. It compares field differences directly to detect the timing when the (local) minimal difference arrives, then detects the 3-2 pattern by using the result. This can detect well-ordered sequences, but it does not provide any solution to the difficulties listed above.
However, known methods for IP conversion still need improvement.
A general background discussion of interlace-to-progressive methods appears in de Haan et al, Deinterlacing-An Overview, 86 Proceedings of the IEEE 1839 (Sep. 1998).