The invention relates generally to the field of video processing, and more particularly to an apparatus and method of estimating optical flow, including motion estimation and analysis of video and multimedia content.
Determining optical flow or image motion is important to computer vision and image processing. Accurate and efficient motion field estimation is meaningful for general video processing and applications, such as motion compensation coding of digital TV, noise reduction for video sequences, frame rate conversion and target tracking. Motion field estimation is also important for computer vision and human vision, such as for the recovery of 3-D motion and the structure of moving objects, and image registration.
An example of where motion field estimation is particularly useful is in MPEG video data compression. One of the main techniques to achieve high compression relies on accurately determining blocks of each frame that are in motion. Data describing the motion for only those blocks in the video determined to be in motion are encoded in the video stream between frames. This results in memory and bandwidth savings.
Optical flow is typically represented as a motion vector field that is a pixel-by-pixel map of image motion from one image frame to the next image frame. Each pixel in the frame has a motion vector that defines a matching pixel in the next or previous frame. The combination of these motion vectors is the motion vector field.
Although the techniques described herein could easily be applied to image components other than frames, such as image fields or portions of image frames, the description below refers only to image frames so as to avoid confusion in terminology with the fields of motion vectors.
The problem of estimating motion vector fields is inherently difficult to achieve. This is because many different sets of motion vector fields may be used to describe a single image sequence.
One simple approach is to assume that a block of pixels moves with the same kind of motion such as constant translation or an affine (planar) motion. This kind of block matching approach frequently fails to produce a good estimation of motion because the motions of pixels outside of the block are disregarded. Thus, such a motion model may be incorrect for describing the true motion of pixels within a block when the block size is large and may be significantly affected by noise when the block size is small.
Conventional approaches to the problem of estimating motion vector fields typically require simultaneously solving equations having several thousand unknown quantities. Numerous techniques, based on gradients, correlation, spatiotemporal energy functions, and feature matching functions have been proposed. These techniques have relied upon local image features such as the intensity of individual pixels and on more global features such as edges and object boundaries.
Two processes have been proposed which have successfully solved two problems in motion vector estimation: motion vector discontinuity and occlusion. The first of these processes is the xe2x80x9cline processxe2x80x9d described in a paper by J. Konrad et al entitled xe2x80x9cBayesian Estimation of Motion Vector Fieldsxe2x80x9d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, pp 910-927 September 1992. The second process is the xe2x80x9cocclusion processxe2x80x9d described in a paper by R. Depommier et al entitled xe2x80x9cMotion Estimation with Detection of Occlusion Areasxe2x80x9d IEEE International Conference on Acoustics and Speech Signal Processing, pp. III 269-272, 1992. Although successful, these processes increase substantially the number of unknowns that need to be estimated and also introduce other parameters particular to the line and/or occlusion processes.
Global formulations over the complete motion field have been proposed to deal with deficiencies of the block matching techniques. One such formulation is proposed by B. Horn and B. Schunck in a paper entitled xe2x80x9cDetermining Optical Flowxe2x80x9d Artificial Intelligence, vol. 17, pp 185-203, 1981. According to this proposal, motion vectors are estimated by minimizing the error of a motion constraint equation and the error of motion smoothness over the entire image. In this formulation, the motion constraint equation is derived from the assumption that the image intensity is constant along the motion trajectory. In other words, the first derivative of the 3D intensity function with respect to time is zero (e.g. dE/dt=0), where E(x,y,t) is the image intensity over space and time. Any departure from this assumed smooth motion is measured as the square of the magnitude of the gradient of motion vectors. While this approach improves the handling of general types of motion, such as elastic motion, the motion vector fields tend to blur at places where the motion is not continuous (i.e. at motion boundaries). This example is probably the most popular method due to its simplicity and reasonable performance. In practice however, it has long been neglected that this constraint is correct only in an infinitesimal neighborhood around the observation point.
In a paper by E. Hilderith, entitled xe2x80x9cComputations Underlying the Measurement of Visual Motion,xe2x80x9d Artificial Intelligence, vol. 23 pp 309-354, 1984, a partial solution to the problem of handling motion boundaries is proposed. According to this proposal, the motion vector field is assumed to be smooth only along a contour but not across it. This proposal overcomes the blurring problem. Because, however, motion vectors at points not lying along contours cannot be obtained, this technique cannot propagate motion information across contours, such as those due to textures, which do not correspond to motion boundaries. These types of contours are common in real-world images.
A technique which combines the line process along with Markov random field modeling and stochastic relaxation has been proposed by S. Genman et al. in a paper entitled xe2x80x9cStochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,xe2x80x9d IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 6, pp 721-741, November 1984. The described technique was used for restoring degraded images. In this context, a line process is a boolean field to mark the image intensity boundaries. Other researchers have adapted this idea to overcome the blurring problem of an estimated motion vector field by modifying the line process to indicate motion boundaries. An example of this technique is contained in the above referenced paper by J. Konrad et al. One drawback of this method is that one additional unknown must be introduced for every two adjoining pixels in order to implement the line process. These additional unknowns greatly increase the computational overhead of any algorithm which employs this method.
Occlusion, by definition, means that part of the image cannot find a matching part in another image which corresponds to the same part of the scene. That part of the image was occluded from one image frame to the next. Occlusion appears quite often in real-world images when, for example, one object moves in front of another object, an object moves toward the camera, or an object rotates. If only two frames are used, there may be difficulty in obtaining a good estimate of motions with occlusion because, for at least some parts of one image, there is no corresponding image part in the other image.
One simple solution to this problem is to use three image frames, a target frame, and the frames occurring immediately before and immediately after the target frame. In most cases of real-world images, a matching portion for image parts in the middle frame can be found in either the preceding or succeeding frame. The above referenced paper by Depommier et al. proposes a combination of the line process, as set forth in the Konrad et al. paper, with an occlusion process to detect occlusion areas using three frames. One drawback of this combination, however, is that this combination requires even more unknowns and parameters to produce the model than the line process alone.
U.S. Pat. No. 5,471,252 to Iu, dated Nov. 28, 1999, describes a method and apparatus for estimating motion vector fields, the contents of which are incorporated by reference herein. However, the present invention efficiently generates a more accurate resultant vector motion field and is better at handling large motions and discontinuities. Improvement in the estimates and improved ability to cope with larger motion is desirable. Such an improvement to estimations would preferably migrate the observation point gradually according to new motion estimates, execute faster, have better convergence, and be capable of handling a larger degree of frame to frame motion than in the current art. There is also a need to more efficiently handle motion discontinuities in the flow field during iteration processing to more effectively reduce motion compensated intensity differences and motion estimation errors.
A review of vector fields techniques is presented followed by a formulation to the problem of determining optical flow. Then a prior art optical flow algorithm by Horn and Schunck is examined.
To understand the operation of the proposed invention, it is helpful to review the conventional technique for defining motion vector fields using smoothness assumptions in order to understand the nature of the problem.
When a camera moves relative to objects being imaged, there are corresponding changes in the image. Disregarding, for the moment, the occlusion of areas and newly exposed areas, for every point of an image at time t, there exists a corresponding point in another image captured at a different time txe2x80x99. Every such pair of points may be connected by a respective straight line to yield a set of motion vectors and to define a displacement field (motion vector field) as the set of these vectors are projected on the image plane. The purpose of motion vector field estimation is to estimate such a motion vector field from an observed image sequence. This motion vector field may then be used for various types of image processing, for example, computer vision, the motion compensated coding of moving images, noise reduction of moving images and frame-rate conversion.
Reference is now made to FIGS. 1 through 5 to describe the operation of the process. In FIG. 1 a video frame is depicted. Each rectangle portion corresponds to a respectively different image component which is preferably a pixel or group of pixels. The pixels may be referenced by x and y values respectively. Each pixel may have a value that is preferably represented by an intensity value E(x,y,t) in the image plane at time t. The horizontal location of the pixel is represented by xe2x80x98xxe2x80x99 and is preferably numbered between 1 and a maximum value illustrated in this example as xe2x80x98axe2x80x99. The vertical location of the pixel is represented by xe2x80x98yxe2x80x99 and is preferably numbered between 1 and a maximum value as illustrated here as xe2x80x98bxe2x80x99. Time is represented as xe2x80x98txe2x80x99. The image in FIG. 1 is shown having contrasting central and surrounding parts for clarity in the description.
FIG. 2 illustrates how a video sequence may be made from a series of successive video frames. Each frame is shown sequentially as time xe2x80x98txe2x80x99 increases. In the present invention, motion is preferably analyzed between a series of adjacent frames.
If there is no motion between two successive frames, a motion vector field 300 such as that shown in FIG. 3 is generated. In this motion vector field, all vector elements are zero, indicating no motion in the image.
As shown in FIG. 4A, a central image area 404 moves to the position of a central image area 402, as indicated by the broken-line box in a field of observation 400 between a current frame and a next frame. FIG. 4B shows a motion vector field generated from the frames, one containing the area 404 and the other containing the area 402. A motion vector for each pixel in the area indicates that the pixel has moved in the direction of the motion.
The following is a mathematical derivation for generating data values representing a motion vector field from other data values representing individual picture elements (pixels) of two or more images.
Let E(x,y,t) denote the image brightness of the point (x,y) in the image plane at time t. Consider the displacement of a patch of the brightness pattern to be and v in the x- and y-directions within time T respectively. Without considering lighting condition change, shading effect, and occlusion in the scene, assume the brightness of the patch remains constant, i.e.,
E(x+u, y+v, t+T)xe2x88x92E(x,y,t)=0xe2x80x83xe2x80x83(1)
It is well-known that this constraint by itself is not sufficient to solve (u,v). Assuming a smooth flow field, the problem of determining optical flow u and v is preferably formulated as to minimize the following objective function.
xcex5=∫∫(xcex5c2+xcex12xcex5s2)dxdyxe2x80x83xe2x80x83(2)
where
xcex5c=E(x+u,y+v,t+T)xe2x88x92E(x,y,t)xe2x80x83xe2x80x83(3)
xcex5s2=ux2+uy2+vx2+vy2)xe2x80x83xe2x80x83(4)
xcex5c and xcex5s2 are the global smoothness and the constant intensity constraints respectively with a relative weighting factor xcex12. Here ux, uy, vx, and vy denote the partial derivatives of the u and v, with respect to x and y. Applying the Euler Equation from calculus of variations, we have                                                         α              2                        ⁢                                          ∇                2                            ⁢              u                                =                                    ϵ              c                        ⁢                          xe2x80x83                        ⁢                          ∂                              ∂                u                                      ⁢                          xe2x80x83                        ⁢                          E              ⁡                              (                                                      x                    +                    u                                    ,                                      y                    +                    v                                    ,                                      t                    +                    T                                                  )                                                    ⁢                  
                ⁢                                            α              2                        ⁢                                          ∇                2                            ⁢              v                                =                                    ϵ              c                        ⁢                          xe2x80x83                        ⁢                          ∂                              ∂                v                                      ⁢                          xe2x80x83                        ⁢                          E              ⁡                              (                                                      x                    +                    u                                    ,                                      y                    +                    v                                    ,                                      t                    +                    T                                                  )                                                                        (        5        )            
where ∇2u and ∇2v are the Laplacian of u and v, respectively and may be approximated by ∇2v≈xcexa({overscore (v)}xe2x88x92v) and ∇2u≈xcexa({overscore (u)}xe2x88x92u), where {overscore (u)}, and {overscore (v)} denote the local average of u and v, respectively.
Horn and Schunck take the Taylor""s series expansion of E(x+u,y+v,t+T) around the point (x,y,t).
E(x+u, y+v, t+T)=E(x,y,t)+Exu+Eyv+EtT+∈xe2x80x83xe2x80x83(6)
where Ex, Ey and Et are the partial derivatives of E in respect to x, y and t at (x,y,t) respectively. The notation ∈ contains the second and higher order terms in u, v, and T. Eliminating ∈ and combining Eq. 1 with, Eq. 6 leads to the well-known prior art optical flow constraint as shown in equation 7.
xcex5b=Exu+Eyv+EtT=0xe2x80x83xe2x80x83(7)
This well-known prior art optical flow constraint is often derived directly from dE/dt=0 by applying the chain rules. Additionally, by substituting Eq. 6 into Eq. 5, the optical flow may be obtained iteratively as shown in equation 8.                                           u                          n              +              1                                =                                                    u                _                            n                        -                                          E                x                            ⁢                              xe2x80x83                            ⁢                                                                                          E                      x                                        ⁢                                                                  u                        _                                            n                                                        +                                                            E                      y                                        ⁢                                                                  v                        _                                            n                                                        +                                      E                    t                                                                    (                                      β                    +                                          E                      x                      2                                        +                                          E                      y                      2                                                        )                                                                    ⁢                  
                ⁢                              v                          n              +              1                                =                                                                      v                  _                                n                            -                                                E                  x                                ⁢                                  xe2x80x83                                ⁢                                                                                                    E                        x                                            ⁢                                                                        u                          _                                                n                                                              +                                                                  E                        y                                            ⁢                                                                        v                          _                                                n                                                              +                                          E                      t                                                                            (                                          β                      +                                              E                        x                        2                                            +                                              E                        y                        2                                                              )                                                  ⁢                                  xe2x80x83                                ⁢                where                ⁢                                  xe2x80x83                                ⁢                β                                      =                          κ              ⁢                              xe2x80x83                            ⁢                              α                2                                                                        (        8        )            
For the majority of images where the higher order derivatives of the intensity function are not all zero, this first-order approximation may no longer lead to good motion estimates when the motion is large. For this reason the approach taken by Horn and Schunck only works well for a very small degree of motion.
Improvement is desired in which the approximation error of system equations becomes smaller and smaller as the updating process continues. Consequently, a new approach is needed to provide better and faster convergence, as well as capability to handle larger motion.
One advantage of the invention is in more accurately and efficiently processing consecutive video frames to determine the motion of objects in video frames and output a representation of that motion as an image motion vector field, wherein each component of the image vector field represents a pixel or group of pixels of a frame.
Another advantage of this invention is in migrating an observation point gradually, according to current motion estimates.
Yet a further advantage of this invention is within the inventive methodology in its capability of handling motion discontinuities.
To achieve the foregoing and other advantages, in accordance with all of the invention as embodied and broadly described herein, an apparatus for generating an image motion vector field which describes a motion of individual image components of a first image frame and corresponding image components of a second image frame in a sequence of image frames. The apparatus comprises a first frame memory for receiving the first image frame, a second frame memory for receiving a second image frame, and an optical flow calculator. The optical flow calculator is configured for generating an image motion vector field by iteratively comparing a predicted image with the second image frame. The predicted image is generated according to a current motion estimate in proximity to an observation point, wherein the observation point migrates according to a previous motion estimate and is implemented by the calculator to generate a current motion estimate.
In a further aspect of the invention, the optical flow calculator includes a gradient estimator configured to define a gradient function which approximates an image gradient value by performing an averaging function on motion estimates at a plurality of individual image component locations determined by the current motion estimate.
In yet a further aspect of the invention, the optical flow calculator includes a motion estimator configured to define a motion estimate function which approximates a velocity adjustment value by applying a motion function to gradient estimates at a plurality of individual gradient estimate locations determined by the current motion estimate.
In yet a further aspect of the invention, optical flow calculator includes a motion estimate averager configured to define a motion estimate averaging function which averages motion values at a plurality of motion estimate component locations determined by the current motion estimate.
In yet a further aspect of the invention, the averager applies a weighting function component to control outlier rejection of individual motion estimates being averaged.
In yet a further aspect of the invention, an image motion vector field is generated comprising the steps of receiving a first image frame having individual image components with intensity values, receiving a second image frame having corresponding image components with intensity values, initializing motion estimates, generating image motion vectors, and outputting the image motion vector field. The image motion vectors are iteratively generated by estimating image gradients in proximity to observation points; generating other motion estimates as the motion estimates, averaging the motion estimates; migrating the observation points according to the motion estimates, generating a predicted image frame according to the motion estimates and the first memory frame, calculating residual errors by taking a difference between the predicted image and the second image frame, determining if the residual error for each image component is less than a predetermined threshold, and accordingly if the motion estimate for the image component has converged, and ending the iterations for each motion estimate that has converged.
The present invention calculates optical flow, resulting in improved estimates over the prior art and an ability to cope with larger motion. In the present invention, the observation point of the Taylor""s series expansion of the intensity function preferably moves in accordance with the current motion estimates. Hence, the constant intensity constraint is better satisfied as the iteration process continues. An local outlier rejection mechanism, whose observation point of also preferably moves in accordance with the current motion estimates may then be utilized to handle motion discontinuities in the flow field, significantly reducing the motion compensated mean squared error (MSE) and effectively improving the near motion boundaries observation point estimates. Local outlier rejection results in an effective means of sharpening the motion boundaries, which often are blurred due to the global smoothness constraint. Compared with the prior art, this new concept offers faster and better convergence and capability to handle a larger degree of motion, effectively reducing the motion compensated intensity difference and motion error. One skilled in the art will easily recognize that this concept may be readily adapted to many optical flow methods and apparatus.
Additional advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.