Determining an optical flow field between two images, particularly for sequences of video frames and/or fields, is frequently encountered step in many high-value video processing tasks such as coding, frame rate conversion, noise reduction, etc. Known methods for calculating optical flow encounter several stumbling blocks. Many methods suffer from low accuracy—motion vectors may not reflect the actual motion; others lack precision—motion vectors are limited to a precision on the order of a single pixel or a particular fraction of a pixel in a limited region; still others suffer from a lack of density—single motion vectors may be available only for entire regions or blocks of an image, instead of on a per-pixel basis. Additionally, widely varying computational and memory bandwidth costs accrue to many, if not all, of these methods.
Existing methods may be broadly sorted into three categories: (1) block-based matching; (2) phase-based estimation; and (3) gradient based estimation. Block-based matching methods are frequently used in video coding and other real-time tasks due to their relatively small complexity and intuitive nature. However, block-based matching methods are limited in dynamic range by the extent of a block search, limited in precision by the granularity of the block search, limited in the accuracy of an interpolator used to sample pixel values at the sub-pixel level, and limited in accuracy due to what is known as the “aperture problem.” The “aperture problem” occurs when a block matching method estimates a wrong motion vector due to lack of sufficient differences between blocks with no texture or edge differences along a gradient, which results in a motion vector at a local minimum instead of the global minimum. Prior art block-based optical flow and motion-estimation methods suffer from the aperture problem, which is further exacerbated in block based methods that attempt to reduce search complexity by using multi-scale or other techniques to reduce the search depth and breadth from that of an exhaustive search. Many block based methods circumvent the aperture problem by not requiring absolute (or even coarse) accuracy of an optical flow estimation. Such methods code only the residual differences left over after applying an optical flow field in a motion-compensation step between two frames under observation. As a result, motion-compensated block-based methods have found widespread application in the field of video coding, at the expense of reduced accuracy.
Phase-based motion estimation techniques have been employed for computing relatively accurate, precise, and substantially noise-immune optical flow, such as the phase-based motion estimation method described in “The Engineer's Guide to Motion Compensation,” by John Watkinson, 1994: Snell & Wilcox Ltd., pp 23-38. However, phase-based motion estimation is performed in the frequency domain and acts upon the phase information computed therein, and therefore requires input images to be transformed to the 2D frequency domain, a very computationally expensive process for video. In an attempt to improve computational efficiency, certain other phase-based motion estimation processes have shown incremental improvement over processes based on Fourier-based phase calculations by changing the type of transform from global to local, such as Gabor-based orientation-based transform filtering. Unfortunately, these techniques still involve relatively applying large filter-banks sequentially to each pixel, resulting in a high memory bandwidth requirement with only a modest decrease in relative overall computational complexity and memory bandwidth requirements. Gradient-based estimation has been employed in several offline and real-time applications, including object segmentation for FLIR (Forward-Looking Infra-Red) target acquisition/rejection as taught in U.S. Pat. No. 5,627,905, and the calculation of temporal-interpolated video frames (“tween frames”) for slow-motion and frame-rate conversion effects, as taught in Thanakorn and Sakchaicharoenkul, “MCFI-based animation tweening algorithm for 2D parametric motion flow/optical flow,” Machine Graphics & Vision International Journal, v. 15 n. 1, p. 29-49, January 2006. The classic optical flow methods taught in B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, vol. 17, pp. 185-203, 1981 (hereinafter “Horn and Schunck”), and Lucas, B. D. and Kanade T., 1981, “An iterative image registration technique with an application to stereo vision,” Proceedings of Imaging understanding workshop, pp. 121-130 (hereinafter “Lucas and Kanade”), produce dense optical flow fields on a per-pixel basis, but cannot reliably generate motion vectors with magnitudes greater than a single pixel, and suffer from inconsistent flow fields in the presence of severe noise, object occlusions, and complex non-translational motion. The method and system described in U.S. Pat. No. 5,680,487 overcomes the single-pixel limitation by using a multi-scale method, but is not robust in the presence of noise and/or occlusions. Further improvements to gradient-based estimation are taught in the U.S. Pat. No. 6,345,106, where an eigen-system analysis is applied to each pixel's surrounding gradient region via eigen analysis of the structure tensor associated with every pixel, to ascertain the mathematical stability of the estimation process. This allows the calculation of a confidence value that may be used to selectively accept or reject calculated results, thereby increasing robustness to noise and to the aperture problem.
Another improvement to gradient-based estimation is to use the second-derivatives (“gradient-constancy assumption”) instead of brightness (“brightness-constancy assumption”) to estimate the actual gradients under examination in order to increase the robustness of the calculation to changes in scene and object lighting as taught in Nagel, H.-H. and Enkelmann, W., “An investigation of smoothness constraints for the estimation of displacement vector fields from image sequences,” IEEE trans. Pattern Anal. Mach. Intell., September 1986, 8, pp. 565-593 (hereinafter “Nagel and Enkelmann”). A drawback to the approach used by Nagel and Enkelmann is that the gradient-constancy constraint is violated under complex motion models such as scaling and rotation. Additionally, the estimation of discrete-space spatio-temporal derivatives under the scheme of Nagel and Enkelmann has proved to make error-free implementation problematic.
The foregoing prior art optical flow methods suffer from mathematical instability in the presence of noise and occlusions, and are further impaired by a consistently-applied, but very coarse approximation of the actual spatio-temporal gradient calculation upon which the rest of the estimation process is dependent (such as the central-differences method, which completely ignores the sampled values at the actual sample location under analysis). These coarse approximations lead to unnecessary errors in the initial estimation process, forcing subsequent stages to clean up or reject estimated values based upon further imperfect heuristics, thresholds, or constraints, all accomplished at a significant expense of further complexity.
The foregoing prior art optical flow methods further suffer from one or more of the problems of: (1) high computational complexity, (2) noise susceptibility due to numerical instability, (3) failure to account for occlusions of pixels from one frame to the other, (4), limited range of motion, (5) inability to accurately calculate a flow field in the presence of changes of brightness due to lighting changes in the scene or of objects in the scene, and/or (6) accuracy problems due to incorrect or inadequate assumptions in the model of the discrete-sampled gradient field.
Accordingly, there is a need for an accurate, precise, relatively low-computational-complexity digital optical flow estimation method and system that is better suited to operate on noisy video images that include changing scene and object lighting, complex motion, and object occlusions.