The invention relates in general to the field of digital video signal processing and in particular to a method and a device for converting video signals of one standard into another.
To achieve an acceptable display of video signals in an image, an image format that is adapted in terms of the temporal or spatial sampling raster to the display used to reproduce the image or to the basic multimedia environment is oftentimes required. To this end, it is often necessary to perform a temporal and/or spatial conversion between different video formats. The temporal conversion may comprise the generation of images with a video frequency that deviates from the video frequency of an original video signal to be converted. The spatial conversion may include the generation of image information of the interlaced lines of a field to generate a frame.
In the development of interpolation methods for moving image or video sequences, it is desired to achieve a fluid display of motion and the avoidance of interpolation artifacts or resolution losses.
Prior art exists involving methods and equipment for converting video signals. More specifically, a multiplicity of algorithms exist that provide temporal and/or spatial conversion which may be classified as either motion-adaptive or motion-vector-based methods. In each case, either linear or nonlinear interpolation techniques may be employed.
A number of static or linear methods of format conversion also exist. However, these typically suffer from loss of resolution or blurring of motion. A relatively simple technique for temporal format conversion comprises an image repetition which produces an unsatisfactory reproduction of motion having multiple contours or “jerky” motion sequences due to the temporally incorrect rendition of motion. Linear temporal low-pass filtering results in blurring of motion, and is thus similarly ill-suited for the interpolation of moving regions.
Static linear methods exist which are based on vertical or temporal filtering and are used to achieve a spatial conversion of an interlaced signal (a signal by which sequential interlaced fields are transmitted) to a progressive video signal (a signal that contains frames). This type of conversion is known as proscan conversion.
Since vertical low-pass filtering results in loss of resolution in the vertical axis, while temporal low-pass filtering causes motion blurring in moving image regions, methods have been developed that adaptively cross-fade between temporal filtering in nonmoving image regions and vertical filtering in moving image regions.
Due to the disadvantages of linear interpolation techniques, methods have been developed that employ nonlinear median filters for interpolation. Median filtering sorts input values of the filter by size and selects the value located at the center of the sorted sequence, the sequence usually consisting of values of an odd number.
A known method for proscan conversion is based on a linear vertical band separation of the video signal into a highs channel and a lows channel, and on the use of complementary median filters in the highs channel and the lows channel. The principle on which use of a spatial band separation is based is the fact that human perception may be described by a two-channel model. In the model, there exists a lows channel with a low spatial but high temporal resolution, and a highs channel with a low temporal but high spatial resolution. A known spatial conversion method of this type utilizes band separation in which a missing intermediate line is generated in a field, the method being illustrated in FIG. 1. FIG. 1 illustrates three sequential fields A1, B1, A2 after band separation into a lows channel and a highs channel, where two sequential fields A1 and B1, or B1 and A2, have mutually shifted image rasters. The lines for which the particular image information is transmitted are identified by boxes. According to the known method, the image information of a pixel (x, y), which lies in an intermediate line, is generated in a field B1 of the lows channel, i.e., in the field at time Tn by median filtering from the adjacent pixels in the vertical axis (x, y−1) and (x, y+1) of the same field, and from the pixel at position (x, y) in the following field A2, i.e., the field at time Tn+1. If P represents the image information of the given pixel (x, y, Tn), then the following equation applies for the image information of the pixel (x, y) in the field B1 of the lows channel:
                              P          ⁡                      (                          x              ,              y              ,                              T                n                                      )                          =                  Med          ⁢                      {                                                                                P                    ⁡                                          (                                              x                        ,                                                  y                          -                          1                                                ,                                                  T                          n                                                                    )                                                                                                                                        P                    ⁡                                          (                                              x                        ,                                                  y                          +                          1                                                ,                                                  T                          n                                                                    )                                                                                                                                        P                    ⁡                                          (                                              x                        ,                        y                        ,                                                  T                                                      n                            +                            1                                                                                              )                                                                                            }                                              (        1        )            where Med represents median filtering.In analogous fashion, the image information of the pixel (x, y) in the intermediate line of field B1 in the highs channel is generated by:
                              P          ⁡                      (                          x              ,              y              ,                              T                n                                      )                          =                  Med          ⁢                      {                                                                                P                    ⁡                                          (                                              x                        ,                                                  y                          -                          1                                                ,                                                  T                          n                                                                    )                                                                                                                                        P                    ⁡                                          (                                              x                        ,                                                  y                          +                          1                                                ,                                                  T                          n                                                                    )                                                                                                                                        P                    ⁡                                          (                                              x                        ,                        y                        ,                                                  T                                                      n                            +                            1                                                                                              )                                                                                            }                                              (        2        )            Thus, for the interpolation of the image information of the pixel (x, y) in the lows channel, image information of pixels from two fields is processed, namely, from fields B1 and A2 in FIG. 1. On the other hand, image information of pixels from three fields, namely, from fields A1, B1 and A2, is used for the interpolation of the image information of the pixel (x, y) in the highs channel which essentially contains the image details.
The filtering for the lows channel is vertically dominant (intrafield-dominant), since two of the three pixels involved are oriented vertically above each other in the same field of the image. The filtering for the highs channel is temporally dominant (interfield-dominant) or raster-dominant since the three pixels involved in filtering derive from three temporally successive fields.
The vertically dominant median filter in the lows channel enables a relatively good rendition of motion, for example, in horizontally moving vertical edges. It results, however, in resolution losses in vertically high-frequency (rapidly changing) image regions which are of secondary importance in the lows channel. The raster-dominant median filter used in the highs channel, on the other hand, has the high vertical resolution required in the highs channel, but results in a poor rendition of motion in this channel. Additional methods based on modified or edge-direction-oriented median filters are known.
The method described with subband-based signal processing for spatial conversion, i.e., the generation of intermediate lines, has been extended to a temporal up-conversion from 50 Hz interlaced signals to 100 Hz interlaced signals, i.e., to a conversion method in which fields with a 100 Hz image sequence are generated from fields with an image sequence of 50 Hz. The resulting interpolation scheme of the lows channel for the interpolation of an intermediate field β that lies temporally exactly between two original fields A and B is illustrated in FIG. 2. This static interpolation scheme enables a positionally correct interpolation of moving edges or other large objects within a certain velocity range. The basis for this is the edge shift property of median filters.
A 5-tap median filter used in the highs channel for intermediate image interpolation is illustrated in FIG. 3. It is evident here that, unlike the lows channel, no pixels re-interpolated in the raster are supplied to the median filter so as to preclude any vertical resolution loss that is critical in the highs channel. Every other field of the input sequence is taken over directly as the field for the output image sequence.
Even using the above-described error-tolerant interpolation concept, static interpolation methods typically only permit a correct display of motion in the intermediate image to be achieved up to a certain velocity range which is a function of the interpolation mask size. In addition, loss of resolution may occur for moving image information in the highs channel even with raster-dominant median filters.
For these reasons, methods employing a motion-vector-based interpolation have been developed for the relatively high-quality interpolation of moving image sequences. In these methods, a motion vector (vx, vy) is assigned to each pixel using an appropriate motion estimation method. The vector indicates by how many raster positions in the x-axis and y-axis a given pixel has moved from one image/field to the next image/field. Various motion estimation methods are known for assigning a motion vector to a pixel or group of pixels.
By incorporating this type of motion vector in the interpolation, or by motion-vector-based addressing of the interpolation filters, as FIG. 4 illustrates, a correct interpolation may be performed even in rapidly moving image regions, assuming an error-free estimation of motion and a purely translational motion.
With reference to FIG. 4, the basic concept of motion-vector-based interpolation is to determine a motion vector VAnBn from positions of a moving pixel in successive images/fields An(T−10ms) and Bn(T+10ms), which vector indicates by how many raster points the pixel has moved, and to interpolate pixels which lie on the path of the motion vector between the positions in the images/fields An(T+10ms) and Bn(T+10ms) based on the image information about the moving pixel and the motion vector in one or more intermediate images which lie temporally between the successive images/fields An(T−10ms) and Bn(T+10ms). As is the case for static and motion-adaptive interpolation, both linear and nonlinear methods exist for motion-vector-based interpolation.
Assuming that the positions of the moving pixel, i.e., the starting point and end point of the motion vector in the images/fields An(T−10ms) and Bn(T+10ms), and thus the motion vector, are precisely known, it is sufficient in a simple linear method to perform a simple shift or a linear averaging. A subband-based interpolation method is known which performs averaging in the lows channel between the two pixels addressed by the starting point and end point of the motion vector. For the image information of a pixel (x, y) in an intermediate image βn, the following equation applies:Pβ(x,y,T0)=1/2[An(x−vx/2,y−vy/2,T−10ms)+Bn(x+vx/2,y+Vy/2,T+10ms)  (3)where An(x−vx/2, y−vy/2, T10ms) denotes the image information of the pixel in the field sequence
An at time T10ms which represents the starting point of the motion vector, and where Bn(x+vx/2, y+Vy/2, T+10ms) denotes the image information of the pixel in the field sequence Bn at time T+10ms which represents the end point of the motion vector. The terms vx and vy are the components of the estimated motion vector VAnBn for the pixel (x−vx/2, y−vy/2) in the field An(T−10ms).
The image information for the intermediate image is determined in the highs channel by a vector-based shift. For the image information of the pixel (x, y) in an intermediate image βn, the applicable equation is:Pβ(x,y,T0)=An(x−vx/2,y−vy/2,T−10ms)  (4)orPβ(x,y,T0)=Bn(x+vx/2,y+vy/2,T+10ms).  (5)
This method has a poor error tolerance, however, in the case of faulty, i.e., incorrectly estimated, motion vectors.
For this reason, a nonlinear interpolation method is known based on a 3-tap median filter. Here, the image information for a pixel (x, y) of an intermediate image β interpolated according to FIG. 4 is determined by median filtering of image information for the starting point and end point of the motion vector, and of a mean of the image information from the starting point and end point as follows:
                                          P            β                    ⁡                      (                          x              ,              y              ,                              T                n                                      )                          =                  Med          ⁢                      {                                                                                                      A                      n                                        (                                                                  x                        -                                                                              v                            x                                                    /                          2                                                                    ,                                              y                        -                                                                              v                            y                                                    /                          2                                                                    ,                                              T                                                                              -                            10                                                    ⁢                                                                                                          ⁢                          m                          ⁢                                                                                                          ⁢                          s                                                                                                                                                                                      1                    /                                          2                      [                                                                                                    A                            n                                                    ⁡                                                      (                                                          x                              ,                              y                              ,                                                              T                                                                                                      -                                    10                                                                    ⁢                                                                                                                                          ⁢                                  m                                  ⁢                                                                                                                                          ⁢                                  s                                                                                                                      )                                                                          +                                                                              B                            n                                                    ⁡                                                      (                                                          x                              ,                              y                              ,                                                              T                                                                                                      +                                    10                                                                    ⁢                                                                                                                                          ⁢                                  m                                  ⁢                                                                                                                                          ⁢                                  s                                                                                                                      )                                                                                                                                                                                                                                      B                      n                                        ⁡                                          (                                                                        x                          +                                                                                    v                              x                                                        /                            2                                                                          ,                                                  y                          +                                                                                    v                              y                                                        /                            2                                                                          ,                                                  T                                                                                    +                              10                                                        ⁢                                                                                                                  ⁢                            m                            ⁢                                                                                                                  ⁢                            s                                                                                              )                                                                                            }                                              (        6        )            
In the case of a correctly estimated vector, the pixels selected in image An and image Bn based on the motion vector—the starting points and end points of the motion vector—are identical, and thus form the initial value of the median filter. In the case of faulty estimation of motion, i.e., when the starting points and end points of the motion vector do not contain the same image information, linear averaging results as a fall-back mode, along with the resulting blurring of motion.
The vector-based intermediate image interpolation methods described so far do not have sufficient tolerance for faulty estimations for the motion vector, i.e., for vector errors. An improvement in the image quality of interpolated images is possible by using weighted vector-based median filters. Therein, a spatial band separation is performed.
An interpolation scheme for the lows channel is illustrated in FIG. 5. A median filter is supplied with image information from multiple pixels which are located around the starting pixel of the motion vector VAnBn in the field An(T−10ms) and around the end pixel of the motion vector VAnBn in the field Bn(T+10ms). The median filter is also supplied with image information, in the form of recursive elements, for already determined adjacent pixels of the intermediate image βn. For purposes of illustration, the median filter in FIG. 5 is supplied with image information from nine pixels of field An (T−10ms), nine pixels of field Bn(T+10ms), and three pixels of intermediate image βn(T0). The image information supplied to the median filter may be variously weighted, the weighting factor indicating how often the image information of a pixel is supplied to the median filter.
The pixels of the lows channel to be interpolated are calculated as follows:
                                          P            β                    ⁡                      (                          x              ,              y              ,                              T                β                                      )                          =                  Med          ⁢                      {                                                                                                      W                      An                                        ⁢                                                                                  ⁢                    ♦                                                                                                              P                      An                                        ⁡                                          (                                                                        x                          -                                                                                    v                              x                                                        /                            2                                                                          ,                                                  y                          -                                                                                    v                              y                                                        /                            2                                                                          ,                                                  T                          An                                                                    )                                                                                                                                                              W                      Bn                                        ⁢                                                                                  ⁢                    ♦                                                                                                              P                      Bn                                        ⁡                                          (                                                                        x                          +                                                                                    v                              x                                                        /                            2                                                                          ,                                                  y                          +                                                                                    v                              y                                                        /                            2                                                                          ,                                                  T                          Bn                                                                    )                                                                                                                                                              W                                              β                        ⁢                                                                                                  ⁢                        n                                                              ⁢                                                                                  ⁢                    ♦                                                                                                              P                                              β                        ⁢                                                                                                  ⁢                        n                                                              ⁡                                          (                                                                        x                          -                          1                                                ,                        y                        ,                                                  T                                                      β                            ⁢                                                                                                                  ⁢                            n                                                                                              )                                                                                            }                                              (        7        )            where WAn, WBn and Wβn describe masks around the specific vector-addressed pixels and ⋄ denotes the duplication operator which indicates how often a sampling value is introduced into the filter mask. The pixels in the fields An(T−10ms) and Bn(T+10ms), around which filter masks are positioned, are each shifted relative to the pixel (x, y) to be interpolated in the intermediate image βn(T0) by a fraction of the motion vector. This fraction in FIG. 5 corresponds to half the motion vector, since the intermediate image βn(T0) to be interpolated is located temporally precisely in the center between the fields An(T−10ms) and Bn(T+10ms). If the motion vector in one of the fields indicates an image line which does not exist in the corresponding field, a re-interpolation takes place.
The advantage of vector-based weighted median filters is the fact that they are able to correct faulty motion vectors up to a certain error size. This is especially significant since vector errors cannot be avoided in natural image sequences.
The property of correcting vector errors is illustrated based on the model of a horizontally moving ideal vertical edge between bright and dark pixels illustrated in FIG. 6, where the faulty motion vector is estimated. FIG. 6 illustrates one line each of fields An, Bn as well as the intermediate image βn, and the respective erroneously estimated vectors.
In the example, it is assumed that the model edge is moving at a velocity of vxreal=4 pixels/field, while the velocity has been erroneously estimated at vxest=0 pixel/field. The result is that the median masks in the previous and following images have been positioned at the same location, whereas the masks should have been correctly displaced relative to each other by 4 pixels.
In the example of FIG. 6, the median filter is supplied with image information from seven pixels of field An including from the pixel x0 which marks the beginning of the dark region after the edge, which pixels lie within the selection mask, the mask under the image line being illustrated by a bold outline. The median filter is also supplied with the image information from seven pixels of field Bn which lie outside the selection mask, outlined in bold above the image line. The positions of these pixels correspond to the positions of the relevant pixels of field An, since the selection mask was not shifted as a result of the incorrect estimation of motion. It is evident that the median filter has been supplied with image information or luminance information from eight dark pixels after the edge, and six bright pixels before the edge. The result of the median filtering of these pixels is a dark pixel, even giving uniformly weighted selection masks, which pixel is interpolated in the intermediate image at position x.
As a result of the median filter, the edge in the intermediate image βn that lies temporally in the center between fields An, Bn is correctly interpolated to a position, despite the faulty motion vectors, which corresponds to half the distance between the edge in field An and the edge in field Bn, as can be verified by the generation of luminance balances between the bright and dark pixels.
A linear interpolation filter, on the other hand, produces a 4 pixel wide region of medium luminance, and thus causes a noticeable blurring of the edge.
In addition to the behavior of a correlated video signal, such as edges or areas, the behavior of a non-correlated video signal (such as irregular textures) may also be examined. It may be shown that weighted median filters are able, given proper selection of the filter weights, to obtain the details of the non-correlated image information when the correct estimation of motion is used. In the event a vector is erroneously estimated, fine details are extinguished, a result that is preferred for 50 Hz to 100 Hz conversion over an erroneous position display.
The masks illustrated in FIG. 8 are employed as the median masks in the lows channel. A star-shaped selection mask is used for the pixels to be selected from fields An and Bn, the starting points and end points of the motion vector VAnBn each being weighted by a factor 5 times greater than the surrounding values (factor 1). The median filter is additionally supplied with image information from a pixel of an intermediate image already determined. A 3-tap median filtering takes place in the image regions uncovered in the lows channel.
For the highs channel (FIG. 7), raster-dominant median filters are applied, as was previously the case in the static method described above, the difference being that these are now vector-addressed. This method is illustrated in FIG. 8.
An IC implementation of an intermediate image interpolation method based on weighted median filters is known. The implementation utilizes an IC for format conversion based on a reliability-controlled median filter.
An interpolation method is known that is also based on weighted median filters. However, here there is no separation of highs/lows, and no linear re-interpolation is performed if the vector addresses a line not present in the field. Instead, the median mask of FIG. 9a is applied for the case in which the vector addresses an existing line, otherwise the median mask of FIG. 9b is applied. If the sum total of the filter weights is an even number, the already calculated pixel of the intermediate image located to the left and above the actual position is incorporated into the median filter.
The interpolation method illustrated in FIG. 8 achieves a relatively good interpolation quality for conversion of 50 Hz interlaced signals to 100 Hz interlaced signals, this being attributable to the error correction properties of weighted median filters.
If one considers a format conversion in which the ratio of input image rate to output image rate deviates from the 50:100 ratio above, then other limiting conditions come into play. This is evident in FIG. 10 which illustrates the temporal positions of the input images for different output image rates when compared to a 50 Hz input sequence.
In the conversion to 100 Hz already considered, either the images of the output sequence temporally match an image of the input sequence, or the intermediate images to be interpolated are located in the display temporally precisely in between two input images. The second case produces the vector projection illustrated in FIG. 4 in which the measured or estimated motion vectors VAnBn, starting from a given pixel of the intermediate image, are projected with a projection factor of 0.5 into the previous or following original image, i.e., the image information of a pixel in the intermediate image is interpolated using the pixels located in those regions of the input images which lie at a distance calculated as a motion vector multiplied by 0.5 from the point to be interpolated.
In the case of other image rate ratios, however, intermediate images must be interpolated at temporal positions which are not located precisely in the center between two input images, with the result that projection factors other than 0.5 are produced for the intermediate images to be interpolated.
In addition, it is also evident that considerably more than one intermediate image to be temporally interpolated may lie between two output images which temporally precisely match one input image. Given a frequency of forg=50 Hz for the input sequence, and an output frequency of fint=60 Hz for the interpolated sequence, only every sixth output image, for example, temporally matches a given input image. Since the ratio of interpolated images to original images thus turns out to be significantly greater than for the conversion to fint=100 Hz, the quality requirements to be met by the interpolation method are similarly significantly higher as well.
In general, all possible projection factors produced by a conversion from forg to fint are determined by the equations:pleft=k·forg/fint−|—k·forg/fint—|  (8)andpright=1−pleft=1−k·forg/fint+|—k·forg/fint—|  (9)where k=0, 1, 2, . . . , kmax−1 and kmax=fint/gcd (forg,fint), and where |_k·forg/fint—| is the integer fraction of k·forg/fint, and gcd denotes the operation for determining the greatest common divisor.
The projection factor pleft thus denotes temporal distance normalized for the period of the original signal between the intermediate image to be interpolated and the particular previous image/field, where the period represents the temporal distance between two successive images/fields of the original image sequence. In analogous fashion, pright denotes the temporal distance normalized for the period of the original signal between the intermediate image to be interpolated and the particular following image/field. The two projection factors pright and pleft add up to 1.
It is evident that precisely kmax different interpolation phases or projection factor pairs, each with one left and right projection factors, are produced for an image rate conversion of forg to fint. Subsequently, a cyclic repetition of the projection factors takes place.
The effect of vector errors for different projection factors varies in magnitude, as will be explained below where these effects of faulty motion vectors are examined in more detail for different projection phases.
For the sake of illustration, it is assumed that for a conversion of 50 Hz to 60 Hz the image ζ1 of a 60-Hz sequence is to be interpolated from field A3 and B3 of the 50-Hz sequence as illustrated in FIG. 10. The associated projection factors pleft into the previous original image A3 and pright into the following original image B3 are determined to be pleft=⅙ and pright=⅚.
The following discussion is based, for the sake of illustration, on a progressive image display. However, the problems discussed are also found in analogous manner in an interlaced display.
For the sake of illustration, it is assumed that the estimated velocity is vest=(12, 12), thereby producing on the previous image A3 a projected velocity ofvpleft=pleft·vest=1/6·(12,12)=(2,2)  (10)and on the following image B3 a projected velocity ofvpright=pright·vest=5/6·(12,12)=(10,10)  (11)with which the interpolation filter in the previous and following image A3, B3 are addressed or positioned starting from the pixel to be interpolated. The position of the pixel to be interpolated and the address position of the filter, which are shifted by Vpright or vpleft starting from the point to be interpolated, are displayed in the intermediate image ζ in FIG. 11. The addressing position denotes the position of the pixels in the original images onto which a filter mask for the selection of pixels to filter is placed. Given a correct estimation of the motion vector, the address positions correspond to the starting and end positions of a pixel moving from image A to image B.
If it is now assumed that the estimated motion vector has a maximum error of ±6 in the x axis and y axis, the error regions illustrated as shaded in FIG. 11 are produced after vector projection, the positioning of the filter masks in original image B being relatively more strongly a function of vector errors than is the positioning of the filter in original image A.
In another example illustrating the effect of a vector error, again in regard to conversion from 50 Hz to 60 Hz, the image ζ1 of FIG. 10 is interpolated from the original images A3 and B3, where the applicable terms for the projection factors are: pleft=⅙ and pright=⅚.
FIG. 12 illustrates the case in which an ideal vertical edge is moving horizontally at a velocity vreal=6 pixels/image. The estimated velocity, however, is vest=0, with the result that filter masks WA in image A and WB in image B are positioned at the same place.
If the same filter masks are used for image A and image B, an interpolation of the moving edge in intermediate image ζ results at a position corresponding to half the distance between the edge in image A and in image B. This position is incorrect since the intermediate image to be interpolated does not lie temporally in the center between images A and B.
What is needed is a method and a device for converting video signals that is capable of supplying a correct interpolation of intermediate images even in cases in which the ratio of the frequency of output image sequence to input image sequence deviates from 100:50, and in particular does not have an integer value.