1. Field of the Invention
The invention relates to a technique for estimating motion vectors for a video or film sequence and may be used for example in a motion estimator for use with a standards converter or a video compression system.
2. Description of the Related Art
Gradient motion estimation is one of three or four fundamental motion estimation techniques and is well known in the literature (references 1 to 18). More correctly called xe2x80x98constraint equation based motion estimationxe2x80x99 it is based on a partial differential equation which relates the spatial and temporal image gradients to motion.
Gradient motion estimation is based on the constraint equation relating the image gradients to motion. The constraint equation is a direct consequence of motion in an image. Given an object, xe2x80x98object(x, y)xe2x80x99, which moves with a velocity (u, v) then the resulting moving image, I (x, y, t) is defined by Equation 1;
I(x,y,t)=object (xxe2x88x92ut, yxe2x88x92vt) 
This leads directly to the constraint equation, Equation 2;             u      ⁢                        ∂                      I            ⁢                          (                              x                ,                y                ,                t                            )                                                ∂          x                      +          v      ⁢                        ∂                      I            ⁢                          (                              x                ,                y                ,                t                            )                                                ∂          y                      +                  ∂                  I          ⁢                      (                          x              ,              y              ,              t                        )                                      ∂        t              =                    ∂                  object          ⁢                      (                          x              ,              y                        )                                      ∂        t              =    0  
where, provided the moving object does not change with time (perhaps due to changing lighting or distortion) then ∂object/∂t=0. This equation is, perhaps, more easily understood by considering an example. Assume that vertical motion is zero, the horizontal gradient is +2 grey levels per pixel and the temporal gradient is xe2x88x9210 grey levels per field. Then the constraint equation says that the ratio of horizontal and temporal gradients implies a motion of 5 pixels/field. The relationship between spatial and temporal gradients is summarised by the constraint equation.
To use the constraint equation for motion estimation it is first necessary to estimate the image gradients; the spatial and temporal gradients of brightness. In principle there are easily calculated by applying straightforward linear horizontal, vertical and temporal filters to the image sequence. In practice, in the absence of additional processing, this can only really be done for the horizontal gradient. For the vertical gradient, calculation of the brightness gradient is confused by interlace which is typically used for television pictures; pseudo-interlaced signals from film do not suffer from this problem. Interlaced signals only contain alternate picture lines on each field. Effectively this is vertical sub-sampling resulting in vertical aliasing which confuses the vertical gradient estimate. Temporally the situation is even worse, if an object has moved by more than 1 pixel in consecutive fields, pixels in the same spatial location may be totally unrelated. This would render any gradient estimate meaningless. This is why gradient motion estimation cannot, in general, measure velocities greater than 1 pixel per field period (reference 8).
Prefiltering can be applied to the image sequence to avoid the problem of direct measurement of the image gradients. If spatial low pass filtering is applied to the sequence then the effective size of xe2x80x98pixelxe2x80x99 is increased. The brightness gradients at a particular spatial location are then related for a wider range of motion speeds. Hence spatial low pass filtering allows higher velocities to be measured, the highest measurable velocity being determined by the degree of filtering applied. Vertical low pass filtering also alleviates the problem of vertical aliasing caused by interlace. Alias components in the image tend to be more prevalent at higher frequencies. Hence, on average, low pass filtering disproportionately removes alias rather than true signal components. The more vertical filtering that is applied the less is the effect of aliasing. There are, however, some signals in which aliasing extends down to zero frequency. Filtering cannot remove all the aliasing from these signals which will therefore result in erroneous vertical gradient estimates and, therefore, incorrect estimates of the motion vector.
Prefiltering an image sequence results in blurring. Hence small details in the image become lost. This has two consequences, firstly the velocity estimate becomes less accurate since there is less detail in the picture and secondly small objects cannot be seen in the prefiltered signal. To improve vector accuracy hierarchical techniques are sometimes used. This involves first calculating an initial, low accuracy, motion vector using heavy prefiltering, then refining this estimate to higher accuracy using less prefiltering. This does, indeed, improve vector accuracy but it does not overcome the other disadvantage of prefiltering, that is, that small objects cannot be seen in the prefiltered signal, hence their velocity cannot be measured. No amount of subsequent vector refinement, using hierarchical techniques, will recover the motion of small objects if they are not measured in the first stage. Prefiltering is only advisable in gradient motion estimation when it is only intended to provide low accuracy motion vectors of large objects.
Once the image gradients have been estimated the constraint equation is used to calculate the corresponding motion vector. Each pixel in the image gives rise to a separate linear equation relating the horizontal and vertical components of the motion vector and the image gradients. The image gradients for a single pixel do not provide enough information to determine the motion vector for that pixel. The gradients for at least two pixels are required. In order to minimise errors in estimating the motion vector it is better to use more than two pixels and find the vector which best fits the data from multiple pixels. Consider taking gradients from 3 pixels. Each pixel restricts the motion vector to a line in velocity space. With two pixels a single, unique, motion vector is determined by the intersection of the 2 lines. With 3 pixels there are 3 lines and, possibly, no unique solution. This is illustrated in FIG. 1. The vectors E1 to E3 are the error from the best fitting vector to the constraint line for each pixel.
One way to calculate the best fit motion vector for a group of neighouring pixels is to use a least mean square method, that is minimising the sum of the squares of the lengths of the error vectors E1 to E3 FIG. 1. The least mean square solution for a group of neighbouring pixels is given by the solution of Equation 3;                     [                                                            σ                xx                2                                                                    σ                xy                2                                                                                        σ                xy                2                                                                    σ                yy                2                                                    ]            ·              [                                                            u                0                                                                                        v                0                                                    ]              =          -              [                                                            σ                x1                2                                                                                        σ                y1                2                                                    ]              where                    σ        xx        2            =              ∑                                            ∂              I                                      ∂              x                                ·                                    ∂              I                                      ∂              x                                            ,                  σ        xy        2            =              ∑                                                            ∂                I                                            ∂                x                                      ·                                          ∂                I                                            ∂                y                                              ⁢                      xe2x80x83                    ⁢          etc                    
where (u0, v0) is the best fit motion vector and the summations are over a suitable region. The (direct) solution of equation 3 is given by Equation 4;       [                                        u            0                                                            v            0                                ]    =            1                                    σ            xx            2                    ⁢                      σ            yy            2                          -                  σ          xy          4                      ⁡          [                                                                                    σ                  xy                  2                                ⁢                                  σ                  yt                  2                                            -                                                σ                  yy                  2                                ⁢                                  σ                  xt                  2                                                                                                                                          σ                  xy                  2                                ⁢                                  σ                  xt                  2                                            -                                                σ                  xx                  2                                ⁢                                  σ                  yt                  2                                                                        ]      
Small regions produce detailed vector fields of low accuracy and vice versa for large regions. There is little point in choosing a region which is smaller than the size of the prefilter since the pixels within such a small region are not independent.
Typically, motion estimators generate motion vectors on the same standard as the input image sequence. For motion compensated standards converters, or other systems performing motion compensated temporal interpolation, it is desirable to generate motion vectors on the output image sequence standard. For example when converting between European and American television standards the input image sequence is 625 line 50 Hz (interlaced) and the output standard is 525 line 60 Hz (interlaced). A motion compensated standards converter operating on a European input is required to produce motion vectors on the American output television standard.
The direct implementation of gradient motion estimation, discussed herein in relation to FIGS. 2 and 3, can give wildly erroneous results. Such behaviour is extremely undesirable. These problems occur when there is insufficient information in a region of an image to make an accurate velocity estimate. This would typically arise when the analysis region contained no detail at all or only the edge of an object. In such circumstances it is either not possible to measure velocity or only possible to measure velocity normal to the edge. It is attempting to estimate the complete motion vector, when insufficient information is available, which causes problems. Numerically the problem is caused by the 2 terms in the denominator of equation 4 becoming very similar resulting in a numerically unstable solution for equation 3.
A solution to this problem of gradient motion estimation has been suggested by Martinez (reference 11 and 12). The matrix in equation 3 (henceforth denoted xe2x80x98Mxe2x80x99) may be analysed in terms of its eigenvectors and eigenvalues. There are 2 eigenvectors, one of which points parallel to the predominant edge in the analysis region and the other points normal to that edge. Each eigenvector has an associated eigenvalue which indicates how sharp the image is in the direction of the eigenvector. The eigenvectors and values are defined by Equation 5;
M.e,=xcex,eiixcex5{1,2}
where;   M  =      [                                        σ            xx            2                                                σ            xy            2                                                            σ            xy            2                                                σ            yy            2                                ]  
The eigenvectors ei are conventionally defined as having length 1, which convention is adhered to herein.
In plain areas of the image the eigenvectors have essentially random direction (there are no edges) and both eigenvalues are very small (there is not detail). In these circumstances the only sensible vector to assume is zero. In parts of the image which contain only an edge feature the eigenvectors point normal to the edge and parallel to the edge. The eigenvalue corresponding to the normal eigenvector is (relatively) large and the other eigenvalue small. In this circumstance only the motion vector normal to the edge can be measured. In other circumstances, in detailed parts of the image where more information is available, the motion vector may be calculated using Equation 4.
The motion vector may be found, taking into account Martinez"" ideas above, by using Equation 6;       [                                        u            0                                                            v            0                                ]    =            -              (                                                            λ                1                                                              λ                  1                  2                                +                                  n                  1                  2                                                      ⁢                          e              1                        ⁢                          e              1              t                                +                                                    λ                2                                                              λ                  2                  2                                +                                  n                  2                  2                                                      ⁢                          e              2                        ⁢                          e              2              t                                      )              ·          [                                                  σ              xt              2                                                                          σ              yt              2                                          ]      
where superscript t represents the transpose operation. Here n1 and n2 are the computational or signal noise involved in calculating xcex1 and xcex2 respectively. In practice n1=n2, both being determined by, and approximately equal to, the noise in the coefficients of M. When xcex1 and xcex2  less than  less than n then the calculated motion vector is zero; as is appropriate for a plain region of the image. When xcex1 greater than  greater than n and xcex2 less than  less than n then the calculated motion vector is normal to the predominant edge in that part of the image. Finally if xcex1, xcex2 less than  less than n then equation 6 becomes equivalent to equation 4. As signal noise, and hence n, decreases then equation 6 provides an increasingly more accurate estimate of the motion vectors as would be expected intuitively.
In practice calculating motion vectors using the Martinez technique involves replacing the apparatus of FIG. 3, below, with more complex circuitry. The direct solution of equation 6 would involve daunting computational and hardware complexity. It can, however, be implemented using only two-input, pre-calculated, look up tables and simple arithmetic operations.
A block diagram of a direct implementation of gradient motion estimation is shown in FIGS. 2 and 3.
The apparatus shown schematically in FIG. 2 performs filtering and calculation of gradient products and their summations. The apparatus of FIG. 3 generates motion vectors from the sums of gradient products produced by the apparatus of FIG. 2. The horizontal and vertical low pass filters (10,12) in FIG. 2 perform spatial prefiltering as discussed above. The cut-off frequencies of {fraction (1/32)}nd band horizontally and {fraction (1/16)}th band vertically allow motion speeds up to (at least) 32 pixels per field to be measured. Different cut-off frequencies could be used if a different range of speeds is required. The image gradients are calculated by three differentiating filters (16, 17, 18).
The vertical/temporal interpolation filters (20) convert the image gradients, measured on the input standard, to the output standard. Typically the vertical/temporal interpolators (20) are bilinear interpolators or other polyphase linear interpolators. Thus the output motion vectors are also on the output standard. The interpolation filters (20) are a novel feature which facilitates interfacing the motion estimator to a motion compensated temporal interpolator. Temporal low pas filtering is normally performed as part of (all 3 of) the interpolation filters (20). The temporal filter (14) has been re-positioned in the processing path so that only one rather than three filters are required. Note that the filters (10, 12, 14) prior to the multiplier array can be implemented in any order because they are linear filters. The summation of gradient products, specified in equation 3, are implemented by the low pass filters (24) following the multiplier array (22). Typically these filters (24) would be (spatial) running average filters, which give equal weight to each tap with their region of support. Other lowpass filters could also be used at the expense of more complex hardware. The size of these filters (24) determines the size of the neighbourhood used to calculate the best fitting motion vector. Examples of filter coefficients which may be used can be found in the example.
A block diagram of apparatus capable of implementing equation 6 and which replaces that of FIG. 3, is shown in FIGS. 4 and 5.
Each of the xe2x80x98eigen analysisxe2x80x99 blocks (30), in FIG. 4, performs the analysis for one of the two eigenvectors. The output of the eigen-analysis is a vector (with x and y components) equal to
si=ei {square root over (xcexi+L /xcexi2+L +n2+L )}
These xe2x80x98sxe2x80x99 vectors are combined with vector ("sgr"xt2,"sgr"yt2) (denoted c in FIG. 4), according to equation 6, to give the motion vector according to the Martinez technique.
The eigen analysis, illustrated in FIG. 5, has been carefully structured so that it can be implemented using lookup tables with no more than 2 inputs. This has been done since lookup tables with 3 or more inputs would be impracticably large using today""s technology. The implementation of FIG. 5 is based on first normalising the matrix M by dividing all its elements by ("sgr"xx2+"sgr"yy2). This yields a new matrix, N, with the same eigenvectors (e1 and e2) and different (but related) eigenvalues (X1 and X2). The relationship between M, N and their eigenvectors and values is given by Equation 7;   N  =                    1                              σ            xx            2                    +                      σ            yy            2                              ⁢      M        =          [                                                                  σ                xx                2                                                              σ                  xx                  2                                +                                  σ                  yy                  2                                                                                                        σ                xy                2                                                              σ                  xx                  2                                +                                  σ                  yy                  2                                                                                                                        σ                xy                2                                                              σ                  xx                  2                                +                                  σ                  yy                  2                                                                                                        σ                yy                2                                                              σ                  xx                  2                                +                                  σ                  yy                  2                                                                        ]      
M.ei=xcexiei
N.ei="khgr"iei
xcexi=("sgr"2xx+"sgr"2yy)"khgr"i
n2=("sgr"2xx+"sgr"2yy)nz
Matrix N is simpler than M as it contains only two independent values, since the principle diagonal elements (N1,1,N2,2) sum to unity and the minor diagonal elements (N1,2, N2,1) are identical. The principal diagonal elements may be coded as ("sgr"xx2xe2x88x92"sgr"yy2)/("sgr"xx2+"sgr"yy2) since Equation 8;             N              1        ,        1              =                  1        2            ⁢              (                  1          +                      (                                                            σ                  xx                  2                                -                                  σ                  yy                  2                                                                              σ                  xx                  2                                +                                  σ                  yy                  2                                                      )                          )                        N              2        ,        2              =                  1        2            ⁢              (                  1          -                      (                                                            σ                  xx                  2                                -                                  σ                  yy                  2                                                                              σ                  xx                  2                                +                                  σ                  yy                  2                                                      )                          )            
Hence lookup tables 1 and 2 have all the information they require to find the eigenvalues and vectors of N using standard techniques. It is therefore straightforward to precalculate the contents of these lookup tables. Lookup table 3 simply implements the square root function. The key features of the apparatus shown in FIG. 5 are that the eigen analysis is performed on the normalised matrix, N, using 2 input lookup tables (1 and 2 ) and the eigenvalue analysis (from table 2) is re-scaled to the correct value using the output of table 3.
The gradient motion estimator described above is undesirably complex. The motion estimator is robust to images containing limited information but FIGS. 4 and 5 show the considerable complexity involved. The situation is made worse by the fact that many of the signals have a very wide dynamic range making the functional blocks illustrated much more difficult to implement.
It is an object of the present invention to provide a technique for estimating motion which yields considerable simplifications without sacrificing performance. This is achieved by normalising the basic constraint equation (equation 2) to control the dynamic range of the signals. As well as reducing dynamic range this also makes other simplifications possible.
The invention provides a motion vector estimation apparatus for use in video signal processing comprising means for calculating a plurality of image gradients, means for calculating an angle (xcex8) corresponding to the orientation of the spatial image gradient vector and the motion speed (vn) in the direction of the spatial image gradient vector from the temporal and spatial image gradients, and means for generating motion vectors from a plurality of values of xcex8 and vn.
The means for calculating the image gradients may comprise temporal and spatial differentiators.
The means for calculating the values of xcex8 and vn may comprise a rectangular to polar coordinate converter.
The means for generating motion vectors preferably calculates the best fitting motion vector for a region of the picture based on the constraint equations corresponding to a plurality of image gradients.
In an embodiment the means for generating motion vectors comprises three two-input look up tables containing precalculated values of matrix Z as herein defined in Equation 14. Alternatively the lookup tables contain precalculated values of xcfx86xe2x88x921 as herein defined.
The invention also provides a method of motion estimation in video or film signal processing comprising calculating a plurality of temporal and spatial image gradients, calculating from the image gradients an angle xcex8 corresponding to the orientation of the spatial Image gradient vector and the motion speed (vn) in the direction of the spatial image gradient vector, and generating motion vectors from a plurality of pairs of values of xcex8 and vn.
The step of generating motion vectors may comprise calculating the best fitting motion vector for a region of the picture based on the normalised constraint equations corresponding to a plurality of image gradients.
The motion vectors may be calculated on the basis of equation 11 or 13 as herein defined.