The invention relates to a technique for estimating motion vectors for a video sequence and, in particular, to a motion estimator for use with a standards converter. The technique has application in any apparatus requiring field or frame rate interpolation, for example, slow motion display apparatus and conversion of film sequences to interlaced video sequences.
Gradient motion estimation is one of three or four fundamental motion estimation techniques and is well known in the literature (references 1 to 18) . More correctly called xe2x80x98constraint equation based motion estimationxe2x80x99 it is based on a partial differential equation which relates the spatial and temporal image gradients to motion.
Gradient motion estimation is based on the constraint equation relating the image gradients to motion. The constraint equation is a direct consequence of motion in an image. Given an object, xe2x80x98object (x, y)xe2x80x99, which moves with a velocity (u, v) then the resulting moving image, I(x, y, t) is defined by Equation 1;
I(x,y,t)=object (xxe2x88x92ut, yxe2x88x92vt)
This leads directly to the constraint equation. Equation 2;             u      ·                        ∂                      I            ⁡                          (                              x                ,                y                ,                t                            )                                                ∂          x                      +          v      ·                        ∂                      I            ⁡                          (                              x                ,                y                ,                t                            )                                                ∂          y                      +                  ∂                  I          ⁡                      (                          x              ,              y              ,              t                        )                                      ∂        t              =                    ∂                  object          ⁡                      (                          x              ,              y                        )                                      ∂        t              =    0  
where, provided the moving object does not change with time (perhaps due to changing lighting or distortion) then object/t=0. This equation is, perhaps, more easily understood by considering an example. Assume that vertical motion is zero, the horizontal gradient is +2 grey levels per pixel and the temporal gradient is xe2x88x9210 grey levels per field. Then the constraint equation says that the ratio of horizontal and temporal gradients implies a motion of 5 pixels/field. The relationship between spatial and temporal gradients is summarised by the constraint equation.
To use the constraint equation for motion estimation it is first necessary to estimate the image gradients; the spatial and temporal gradients of brightness. In principle these are easily calculated by applying straightforward linear horizontal, vertical and temporal filters to the image sequence. In practice, in the absence of additional processing, this can only really be done for the horizontal gradient. For the vertical gradient, calculation of the brightness gradient is confused by interlace which is typically used for television pictures; pseudo-interlaced signals from film do not suffer from this problem. Interlaced signals only contain alternate picture lines on each field. Effectively this is vertical sub-sampling resulting in vertical aliasing which confuses the vertical gradient estimate. Temporally the situation is even worse, it an object has moved by more than 1 pixel in consecutive fields, pixels in the same spatial location may be totally unrelated. This would render any gradient estimate meaningless. This is why gradient motion estimation cannot, in general, measure velocities greater than 1 pixel per field period (reference 8).
Prefiltering can be applied to the image sequence to avoid the problem of direct measurement of the image gradients. If spatial low pass filtering is applied to the sequence then the effective size of xe2x80x98pixelsxe2x80x99 is increased. The brightness gradients at a particular spatial location are then related for a wider range of motion speeds. Hence spatial low pass filtering allows higher velocities to be measured, the highest measurable velocity being determined by the degree of filtering applied. Vertical low pass filtering also alleviates the problem of vertical aliasing caused by interlace. Alias components in the image tend to be more prevalent at higher frequencies. Hence, on average, low pass filtering disproportionately, removes alias rather than true signal components. The more vertical filtering that is applied the less is the effect of aliasing. There are, however, some signals in which aliasing extends down to zero frequency. Filtering cannot remove all the aliasing from these signals which will therefore result in erroneous vertical gradient estimates and, therefore, incorrect estimates of the motion vector.
Prefiltering an image sequence results in blurring. Hence small details in the image become lost. This has two consequences, firstly the velocity estimate becomes less accurate since there is less detail in the picture and secondly small objects cannot be seen in the prefiltered signal. To improve vector accuracy hierarchical techniques are sometimes used. This involves first calculating an initial, low accuracy, motion vector using heavy prefiltering, then refining this estimate to higher accuracy using less prefiltering. This does, indeed, improve vector accuracy but it does not overcome the other disadvantage of prefiltering, that is, that small objects cannot be seen in the prefiltered signal, hence their velocity cannot be measured. No amount of subsequent vector refinement, using hierarchical techniques, will recover the motion of small objects if they are not measured in the first stage. Prefiltering is only advisable in gradient motion estimation when it is only intended to provide low accuracy motion vectors of large objects.
Once the image gradients have been estimated the constraint equation is used to calculate the corresponding motion vector. Each pixel in the image gives rise to a separate linear equation relating the horizontal and vertical components of the motion vector and the image gradients. The image gradients for a single pixel do not provide enough information to determine the motion vector for that pixel. The gradients for at least two pixels are required. In order to minimise errors in estimating the motion vector it is better to use more than two pixels and find the vector which best fits the data from multiple pixels. Consider taking gradients from 3 pixels. Each pixel restricts the motion vector to a line in velocity space. With two pixels a single, unique, motion vector is determined by the intersection of the 2 lines. With 3 pixels there are 3 lines and, possibly, no unique solution. This is illustrated in FIG. 1. The vectors E1 to E3 are the error from the best fitting vector to the constraint line for each pixel.
One way to calculate the best fit motion vector for a group of neighboring pixels is to use a least mean square method, that is minimising the sum of the squares of the lengths of the error vectors (E1 to E3 FIG. 1). The least mean square solution for a group of neighbouring pixels is given by the solution of Equation 3;             [                                                  σ              xx              2                                                          σ              xy              2                                                                          σ              xy              2                                                          σ              yy              2                                          ]        ·          [                                                  u              0                                                                          v              0                                          ]        =      -          [                                                  σ              xt              2                                                                          σ              yt              2                                          ]      
where             σ      xx      2        =          ∑                                    ∂            I                                ∂            x                          ·                              ∂            I                                ∂            x                                ,            σ      xy      2        =          ∑                                    ∂            I                                ∂            x                          ·                              ∂            I                                ∂            y                              
etc
where (u0, v0) is the best fit motion vector and the summations are over a suitable region. The (direct) solution of equation 3 is given by Equation 4;       [                                        u            0                                                            v            0                                ]    =            1                                    σ            xx            2                    ⁢                      σ            yx            2                          -                  σ          xy          4                      ⁡          [                                                                                    σ                  xy                  2                                ⁢                                  σ                  yt                  2                                            -                                                σ                  yy                  2                                ⁢                                  σ                  xt                  2                                                                                                                                          σ                  xy                  2                                ⁢                                  σ                  xt                  2                                            -                                                σ                  xx                  2                                ⁢                                  σ                  yt                  2                                                                        ]      
Small regions produce detailed vector fields of low accuracy and vice versa for large regions. There is little point in choosing a region which is smaller than the size of the prefilter since the pixels within such a small region are not independent.
Typically, motion estimators generate motion vectors on the same standard as the input image sequence. For motion compensated standards converters, or other systems performing motion compensated temporal interpolation, it is desirable to generate motion vectors on the output image sequence standard. For example when converting between European and American television standards the input image sequence is 625 line 50 Hz (interlaced) and the output standard is 525 line 60 Hz (interlaced). A motion compensated standards converter operating on a European input is required to produce motion vectors on the American output television standard.
It is an object of the present invention to provide a method and apparatus capable of generating motion vectors on an output standard different from the input standard. This is achieved by first calculating image gradients on the input standard and then converting these gradients to the output standard before implementing the rest of the motion estimation process.
The direct implementation of gradient motion estimation, discussed herein in relation to FIGS. 2 and 3, can give wildly erroneous results. Such behaviour is extremely undesirable. These problems occur when there is insufficient information in a region of an image to make an accurate velocity estimate. This would typically arise when the analysis region contained no detail at all or only the edge of an object. In such circumstances it is either not possible to measure velocity or only possible to measure velocity normal to the edge. It is attempting to estimate the complete motion vector, when insufficient information is available, which causes problems. Numerically the problem is caused by the 2 terms in the denominator of equation 4 becoming very similar resulting in a numerically unstable solution for equation 3.
A solution to this problem of gradient motion estimation has been suggested by Martinez (references 11 and 12). The matrix in equation 3 (henceforth denoted xe2x80x98Mxe2x80x99) may be analysed in terms of its eigenvectors and eigenvalues. There are 2 eigenvectors, one of which points parallel to the predominant edge in the analysis region and the other points normal to that edge. Each eigenvector has an associated eigenvalue which indicates how sharp the image is in the direction of the eigenvector. The eigenvectors and values are defined by Equation 5;   where  ;      xe2x80x83    ⁢      M    =          [                                                  σ              xx              2                                                          σ              xy              2                                                                          σ              xy              2                                                          σ              yy              2                                          ]      
The eigenvectors ei are conventionally defined as having length 1, which convention is adhered to herein.
In plain areas of the image the eigenvectors have essentially random direction (there are no edges) and both eigenvalues are very small (there is no detail). In these circumstances the only sensible vector to assume is zero. In parts of the image which contain only an edge feature the eigenvectors point normal to the edge and parallel to the edge. The eigenvalue corresponding to the normal eigenvector is (relatively) large and the other eigenvalue small. In this circumstance only the motion vector normal to the edge can be measured. In other circumstances, in detailed parts of the image where more information is available, the motion vector may be calculated using Equation 4.
The motion vector may be found, taking into account Martinez"" ideas above, by using Equation 6;       [                                        u            0                                                            v            0                                ]    =            -              (                                                            λ                1                                                              λ                  1                  2                                +                                  n                  1                  2                                                      ⁢                          xe2x80x83                        ⁢                          e              1                        ⁢                          e              1              t                                +                                                    λ                2                                                              λ                  2                  2                                +                                  n                  2                  2                                                      ⁢                          xe2x80x83                        ⁢                          e              2                        ⁢                          e              2              t                                      )              ·          [                                                  σ              xt              2                                                                          σ              yt              2                                          ]      
where superscript t represents the transpose operation. Here n1 and n2 are the computational or signal noise involved in calculating xcex1 and xcex2 respectively. In practice n1≈n2, both being determined by, and approximately equal to, the noise in the coefficients of M. When xcex1 and xcex2 less than  less than n then the calculated motion vector is zero; as is appropriate for a plain region of the image. When xcex1 greater than  greater than n and xcex2 less than  less than n then the calculated motion vector is normal to the predominant edge in that part of the image. Finally if xcex1, xcex2 greater than  greater than n then equation 6 becomes equivalent to equation 4. As signal noise, and hence n, decreases then equation 6 provides an increasingly more accurate estimate of the motion vectors as would be expected intuitively.
In practice calculating motion vectors using the Martinez technique involves replacing the apparatus of FIG. 3, below, with more complex circuitry. The direct solution of equation 6 would involve daunting computational and hardware complexity. It can, however, be implemented using only two-input, pre-calculated, look up tables and simple arithmetic operations. It is another object of the present invention to provide a streamlined implementation of the Martinez technique.
The invention provides motion vector estimation apparatus for use in video signal processing comprising means for calculating image gradients for each input sampling site of a picture sequence, the image gradients being calculated on the same standard as the input signal, means for converting the image gradients from the first standard to a second, output standard, and means for generating a plurality of motion vectors from the image gradients, the apparatus being arranged to convert the image gradients from the input standard to the output standard before calculation of motion vectors thereby producing motion vectors on the desired output standard. The motion vectors are calculated on the output standard thereby avoiding the difficulties and inaccuracies involved in converting the signals to the output standard after calculation of the motion vectors.
The apparatus may comprise temporal and spatial low pass filters for prefiltering the input video signal. Prefiltering increases the maximum motion speed which can be measured and reduces the deleterious effects of vertical/temporal aliasing.
The means for calculating the image gradients may comprise temporal and spatial (horizontal and vertical) differentiators.
The means for converting the image gradients from the input standard to the output standard comprise vertical/temporal interpolators. For example a linear (polyphase) interpolator such as a bilinear interpolator.
The image gradients corresponding to a plurality of output sampling sites are used to calculate the motion vectors. The motion vectors may be calculated using a least mean square solution for a group of neighbouring output sampling sites.
In an embodiment the apparatus further comprises a multiplier array having as its inputs the image gradients previously calculated and converted to the output standard, and corresponding low pass filters for summing the image gradient products. The means for calculating the motion vectors utilises the sums of the image gradient products corresponding to a group of neighbouring output sampling sites to produce the best fit motion vector for the group of sampling sites. A different group of neighbouring sampling sites may be used to calculate each motion vector. The means for calculating motion vectors, determines the best fit motion vector given by equation 4 or equation 6 as herein defined.
In an alternative embodiment the apparatus comprises rectangular to polar coordinate converter means having the spatial image gradients converted to the output standard as its inputs and the motion vectors are determined for a group of output sampling sites based on the angle and magnitude of the image gradients of each sampling site in said group. The motion vectors being calculated on the basis of equation 11 or 13 as herein defined.
The invention also provides a method of motion estimation in video or film signal processing comprising calculating image gradients for each input sampling site of a picture sequence, the image gradients being calculated on a first, input standard, generating a plurality of motion vectors from the image gradients, the image gradients being converted to a second, output standard before generating the motion vectors thereby generating motion vectors on the desired output standard.
The method may comprise a prefiltering step. The input video signal may be prefiltered for example using temporal and spatial lowpass filters.
The image gradients corresponding to a plurality of output sampling sites are used to calculate the motion vectors. The motion vectors may be calculated using a least mean square solution for a group of neighbouring sampling sites.
The step of generating motion vectors may comprise using the sums of the image gradient products corresponding to a group of neighbouring output sampling sites to produce the best fit motion vector for each said group. The motion vectors may be calculated using equation 4 or 6 as defined herein.
In an embodiment the step of generating motion vectors may comprise performing eigen-analyses on the sums of the image gradient products using the spatial image gradients converted to the output standard and assigning two eigenvectors and eigenvalues to each output sampling site. The motion vector for each group of sampling sites is calculated by applying equation 6, as herein defined, to the results of the eigen analyses.
In another embodiment the step of generating motion vectors comprises transforming the spatial image gradient vectors on the output standard from rectangular to polar coordinates and the motion vectors are determined for a group of output sampling sites based on the angle and magnitude of the image gradients of each sampling site in said group. The motion vectors being calculated on the basis of equation 11 or 13 as herein defined.