The present invention relates to the field of video processing, and more particularly to motion field modeling and estimation of video content using a motion transform.
Motion field modeling and estimation is important to computer vision and image processing. Accurate and efficient motion field estimation is meaningful for general video processing and applications, such as motion compensation coding of digital TV, noise reduction for video sequences, frame rate conversion and target tracking. Motion field estimation is also important for computer vision and human vision, such as for the recovery of 3-D motion and the structure of moving objects, and image registration.
An example of where motion field estimation is particularly useful is in MPEG video data compression. One of the main techniques to produce high compression techniques relies on accurately determining blocks of each frame that are in motion. Data describing the motion for only those blocks in the video determined to be in motion are encoded in the video stream between frames. This results in memory and bandwidth savings.
Motion fields are typically represented as motion vector fields that are a pixel-by-pixel map of image motion from one image frame to the next image frame. Each pixel in the frame has a motion vector that defines a matching pixel in the next or previous frame. The combination of these motion vectors is the motion vector field. Storage requirements for vector fields may be large. There is a need for an apparatus and method that can efficiently model and estimate a motion vector field thereby reducing the memory requirements for storing the motion vector field.
To provide a better understanding of motion vector fields, a brief review of prior art that may lead to a motion vector field follows.
FIG. 1 depicts a video frame. Each rectangle portion corresponds to a respectively different image component which is preferably a pixel or group of pixels. The pixels may be referenced by x and y values respectively. Each pixel may have a value that is preferably represented by an intensity value E(x,y,t) in the image plane at time t. The horizontal location of the pixel is represented by xe2x80x98xxe2x80x99 and is preferably numbered between 1 and a maximum value illustrated in this example as xe2x80x98axe2x80x99. The vertical location of the pixel is represented by xe2x80x98yxe2x80x99 and is preferably numbered between 1 and a maximum value as illustrated here as xe2x80x98bxe2x80x99. Time is represented as xe2x80x98txe2x80x99. The exemplary image data used by the apparatus and methods described have pixels with random values. The image is shown having contrasting central and surrounding parts for clarity in the description.
FIG. 2 illustrates how a video sequence may be made from a series of successive video frames. Each frame is shown sequentially as time xe2x80x98txe2x80x99 increases. In the present invention, motion is preferably analyzed between a series of adjacent frames.
If there is no motion between two successive frames, a motion vector field 300 such as that shown in FIG. 3 may be generated. In this motion vector field, all vector elements are zero, indicating no motion in the image.
As shown in FIG. 4A, a central area 404 moves to the position of a central area 402, as indicated by the broken-line box in a field of observation 400 between a current frame and a next frame. When a method according to the present invention is used to generate a motion vector field from the frames, one containing the area 404 and the other containing the area 402, a motion vector field such as that shown in FIG. 4B is generated. A motion vector for each pixel in the area indicates that the pixel has moved in the direction of the motion.
Although the techniques described herein could easily be applied to image components other than frames, such as image fields or portions of image frames, the description below refers only to image frames so as to avoid confusion in terminology with the fields of motion vectors.
Motion estimation is defined as finding the motion vectors v(x)=[u(x), v(x)]T, ∀x, from one image to another, where x=[x,y]T denotes the pixel location. A constant intensity constraint I1(x)=I2(v+v(x)), ∀x, is based on the assumption that each pixel on one image moves to another position on the other image without changing the pixel intensity. This constant intensity constraint by itself forms an underconstrained system and therefore the motion vectors cannot be solved.
Much work has been done to find additional constraints which are suitable for modeling the true motion field. Optical flow algorithms often assume the smoothness of the motion field and occasionally deal with motion discontinuities. Active-mesh based approaches reduce the number of unknowns by tracking only a set of feature (or nodal) points based on a neighboring image structure or a mesh structure. A dense motion field may then be interpolated from the nodal points"" movements.
Another category is the parametric or model-based approach which assumes that a motion field may be described by a single or multiple motion model(s) or geometric transformation(s) by using a relatively small number of parameters. Under the umbrella of parametric methods, the present invention uses a motion transform, in which the motion field is represented in the transform domain and is treated as the unknown signal to be estimated. Note that this approach is different from motion estimation using the phase-correlation method as described in a paper by J. Fleet et al. entitled xe2x80x9cComputation of component image velocity from local phase informationxe2x80x9d Int""l J. Comput. Vis., 5:77-104, 1990 or spatio-temporal frequency domain analysis as described in a paper by C. Lien et al. entitled xe2x80x9cComplex-subband transform for subband-based motion estimation/compensation and codingxe2x80x9d IEEE Trans. on Image Processing, 6(5):694-702, 1997, in which the transform is performed on the image intensity field. An advantage of using a motion transform is that the motion transform may model any motion field, including motion discontinuities, provided that the full spectrum in the transform domain is considered. A motion transform offers a great generality for motion modeling since the estimated motion surface does not need to be restricted to a planar (e.g., affine) or a polynomial surface (e.g., pseudo-perspective, biquadratic, or any other second or higher-order polynomial model). Moreover, the motion transform offers the flexibility to choose/remove certain time-frequency components in order to accommodate the underlying motion field. Very often, a small number of selected transform coefficients may be effective to describe the motion or warping between frames, which may provide an economic means for motion-compensated video coding. Motion estimation results by using the DCT/DFT for motion modeling, especially DCT, due to its simplicity, efficiency, and greater flexibility are quite comparable to a wavelet-based approach proposed by Wu et al. in a paper entitled xe2x80x9cOptical flow estimation using wavelet motion modelxe2x80x9d, ICCV ""98, 1998, in which a wavelet function as described in a paper by Cai et al. entitled xe2x80x9cAdaptive multiresolution collocation methods for initial boundary value problems of nonlinear pdesxe2x80x9d SIAM J. Numer. Anal., 33(3):937-970, June 1996 is adopted to model the motion field.
One advantage of the invention is in more accurately and efficiently processing consecutive video frames to determine the motion of objects in video frames and output a representation of that motion as an image motion vector field, wherein each component of the image vector field represents a pixel or group of pixels of a frame.
Another advantage of this invention is that it can model any motion field including motion discontinuities.
Yet a further advantage of this invention is that it offers the flexibility of dynamically choosing the significant time-frequency components used to model the underlying motion.
To achieve the foregoing and other advantages, in accordance with all of the invention as embodied and broadly described herein, an apparatus for generating an image motion vector field which describes a motion of individual image components of a first image frame and corresponding image components of a second image frame in a sequence of image frames, the apparatus comprising a first frame memory for receiving said first image frame; a second frame memory for receiving a second image frame; and an optical flow calculator configured for generating an image motion vector field by iteratively comparing a predicted image with the second image frame, the predicted image being produced based upon said first memory frame and image gradients generated according to a motion estimate that is produced according to a transform function using transform coefficients. The estimated transform coefficients are estimated based upon a previously determined image gradient.
In yet a further aspect of the invention, the optical flow calculator further includes a coefficient estimator configured to generate the estimated transform coefficients by solving a linear coefficient equation using the image gradients and a plurality of individual image components, wherein the transform coefficients are unknown values in the coefficient equation.
In yet a further aspect of the invention, the optical flow calculator further includes a motion estimator configured to generate a motion estimate from the transform coefficients using an inverse transform equation.
In yet a further aspect of the invention, the optical flow calculator further includes a coefficient updater configured to generate image gradients from the motion estimates.
In a further aspect of the invention, a method for generating an image motion vector field comprising the steps of receiving a first image frame having individual image components; receiving a second image frame having corresponding image components; initializing an image gradient; and generating the image motion vector field. The step of generating the image motion field further comprises iteratively: estimating transform coefficients from the individual image components and the image gradient according to a transform coefficient function; calculating a motion field according to the estimated transform coefficients; calculating image gradients according to the motion field; generating a predicted image frame according to the image gradients and the first memory frame; calculating a residual error by taking a difference between the predicted image and the second image frame; determining if the residual error is less than a predetermined threshold, and accordingly if the predicted image has converged. If the image has converged, then ending the iterations; and outputting the image motion vector field.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.