The present invention relates to compression of motion video signals, and more particularly to fractional pixel motion estimation of video signals for simplifying the determination of a best motion vector for each pixel of the video signals during video compression.
Video and many medical images are received as sequences of two-dimensional image frames or fields. To transmit such images as digital signals some form of compression is required. Three basic types of redundancy are exploited in a video compression process: temporal redundancy, spatial redundancy and amplitude redundancy. Interframe coding techniques make use of the redundancy between successive frames (temporal redundancy). In these techniques the information defining elements of a picture, i.e., pixels, are estimated by interpolation or prediction using information from related locations in preceding and/or succeeding versions of the picture, as exemplified in U.S. Pat. No. 4,383,272 issued May 10, 1983 to Netravali et al entitled "Video Signal Interpolation Using Motion Estimation." A typical compression encoder is shown in FIG. 1 where a video signal is input to a preprocessor and then into a motion estimator. The motion estimator delays the video signal, to compensate for the processing delays for the motion vector generation process, before providing the video signal to an encoder loop where compression is performed. The compression is performed using a motion vector generated by the motion estimator, which is multiplexed with the compressed video signal at the output of the encoder for transmission.
The interpolation between frames in the encoder is performed by first estimating the motion trajectory, i.e., motion vector or displacement vector, of each pixel. If an estimate of such displacement is available, then more efficient prediction may be performed in the encoder by relating to elements in a previous frame that are appropriately spatially displaced. These displacement vectors are used to project each pixel along its trajectory, resulting in the motion compensated prediction or interpolation. Once the motion vectors are determined, then the differences between consecutive motion compensated frames that exceed a predetermined threshold are determined by the encoder loop as the compressed video signal.
Most motion estimation in interframe coding assumes (i) objects move in translation, i.e., zoom and rotation are not considered, (ii) illumination is spatially and temporally uniform, and (iii) occlusion of one object by another and uncovered background are not considered. In practice motion vectors are estimated for blocks of pixels so that the displacements are piecewise constant. Block matching is used to estimate the motion vector associated with each block of pixels in a current coding frame or field, assuming that the object displacement is constant within a small two-dimensional block of pixels. In these methods the motion vector for each block in the current frame or field is estimated by searching through a larger search window in a previous frame/field and/or succeeding frame/field for a best match using correlation or matching techniques. The motion estimator compares a block of pixels in the current frame with a block in the previous or future frame by computing a distortion function, such as shown in FIG. 2. Each block in the current frame is compared to displaced blocks at different locations in the previous or future frame within a search window, and the displacement vector that gives the minimum value of the distortion function is selected as being the best representation of the motion for that block.
Using the notation (row,column) to present a position in a picture, for a block of M.times.N pixels at (m,n) the distortion function D.sub.(m,n) (i,j) for a displacement of (i,j) may be given as EQU D.sub.(m,n) (i,j).DELTA..SIGMA..sup.M.sub.k=1 .SIGMA..sub.l=1.sup.N f(v(m+k,n+1) -u(m+k-i,n+l-j))
where u(,) is the previous or future image, v(,) is the current image, and f(x) is a given positive and increasing function of x. In general the candidate displacement vector (i,j) is restricted to a preselected [-p.sub.1,p.sub.2 ]x[-q.sub.1,q.sub.2 ] region, or search window. Some useful choices for f(x) are .vertline.x.vertline. and x.sup.2. Minimizing D.sub.(m,n) (,) for various (i,j)s for a given (m,n) gives the displacement vector for the block at (m,n).
If i and j are both integers, minimization of the distortion function gives the motion vectors to an integer accuracy, or a full pixel. Fractional pixel accuracy motion vectors usually give better motion compensated prediction than the full pixel motion vectors. Fractional pixel accuracy motion vectors may be obtained by computing u(,) at fractional pixel grid locations through spatial interpolation. However obtaining the fractional pixel accuracy motion vectors is computationally very expensive. Netravali et al, as described in the article entitled "A Codec for HDTV", IEEE Trans. Consumer Electronics, vol. 38, pp. 325-340, Aug. 1992, use a simple scheme to approximate the half pixel motion vectors independently horizontally and vertically using the distortion function computed at integer pixel locations. Let D.sub.(m,n) (i,j) be minimum for (i,j)=(i.sub.0,j.sub.0)(integer pixel accuracy). A parabola is fit to the three points around the minimum, and the resulting equation is solved to find the position of the minimum of the curve. The process of computing the fractional pixel accuracy motion vector (i'.sub.0,j'.sub.0) for a block at (m,n) simplifies to solving EQU I.sub.0 -1/2; (3D.sub.(m,n) (i.sub.0 +1,j.sub.0)-2D.sub.(m,n) (i.sub.0,j.sub.0)-D.sub.(m,n) (i.sub.0 -1,j.sub.0))&lt;0 EQU i'.sub.0 =i.sub.0 +1/2; (3D.sub.(m,n) (i.sub.0 -1, j.sub.0)-2D.sub.(m,n) (i.sub.0,j.sub.0)-D.sub.(m,n) (i.sub.0 +1,j.sub.0))&lt;0 EQU i.sub.0 ; otherwise
and EQU j.sub.0 -1/2; (3D.sub.(m,n) (i.sub.0, j.sub.0 +1)-2D.sub.(m,n) (i.sub.0,j.sub.0)-D .sub.m,n) (i.sub.0,j.sub.0 -1))-0 EQU j'.sub.0 =j.sub.0 +1/2; (3D.sub.(m,n) (i.sub.0,j.sub.0 -1)-2D.sub.(m,n) (i.sub.0,j.sub.0 +1))&lt;0 EQU j.sub.0 ; otherwise
This fractional pixel motion estimation is performed by a motion vector refinement generator as shown in FIG. 2. FIG. 3 shows a block of pixels in a frame about the minimum integer pixel, and the fractional pixel locations that surround that pixel for which the distortion function is determined by interpolation from the distortion function for the surrounding pixels.
What is desired is an improved fractional pixel motion estimation of video signals that provides greater accuracy.