Determining an optical flow or motion vector field between two images, particularly for sequences of video frames and/or fields, is frequently encountered in many high-value video processing tasks such as coding, frame rate conversion, noise reduction, etc. Conventional methods for calculating optical flow encounter several stumbling blocks—many solutions of which are described in U.S. Pat. No. 8,355,534 (hereinafter, “the '534 patent”), incorporated herein by reference in its entirety. As taught in the '534 patent, object occlusion presents a challenge for any motion estimation system, such as an optical flow estimation system.
FIG. 1 shows an example of an image pair 100a, 100b, with background 105 and foreground 110, where a foreground object 115 is in motion and which shows an occlusion region 120 and a disocclusion region 125. When the foreground object 115 is in motion in a video sequence, background pixels of the image 100b in the forward-motion direction are hidden (known herein as occlusion or the occlusion region 120) while background pixels of the image 100b behind the motion are revealed (known herein as disocclusion or the disocclusion region 125). In the occluded areas of an image, there is no definite motion attributable to the background; concomitantly, there is no definite motion attributable to the foreground object in disoccluded regions of the image. These two types of areas within a pair of images (collectively known herein as occlusion regions) are very problematic for motion estimation in general, and for many optical flow systems in particular, because erroneous motion vector values in these regions tend to propagate into non-occlusion regions, adversely affecting the overall accuracy of the optical flow estimation. Determination of occlusion regions has many benefits for other high-value video analysis tasks in addition to improvement of optical flow and motion estimation, such as disparity and depth estimation, image segmentation, object identification, and 3D conversion and projection.
The detection of occlusion has received much attention in the context of motion estimation, depth estimation and image/video segmentation. Occlusion can be estimated or computed explicitly or implicitly. Occlusion boundaries themselves provide strong cues for 3D scene reconstruction. Methods as described in A. Saxena, M. Sun, and A. Y. Ng, “Make3D: Learning 3D Scene structure form a Single Image,” PAMI, 31: 824-840, 2009, and in D. Hoiem, A. A. Efros, and A. Hebert, “Recovering Occlusion Boundaries from an Image,” International Journal on Computer Vision, pages 1-19, 2010, propose to find occlusion boundaries using a single frame by over-segmentation and supervised-learning. With no motion information, occlusion boundary detection is an inherently ambiguous problem. Other methods attempt to layer input video into flexible sprites to infer occluded pixels/regions (see e.g., N. Jojic and B. J. Frey, “Learning Flexible Sprites in Video layers,” in CVPR, 2001). Layered methods provide realistic modeling of occlusion boundaries, but these methods need to have continuous regions, relative order of surfaces, and predetermined motion. The method described in Sun, D., Sudderth, E. B., Black, M. J., “Layered image motion with explicit occlusions, temporal consistency, and depth ordering,” in: Advances in Neural Information Processing Systems, pp. 2226-2234 (2010), explicitly models occlusion and the results obtained are relatively accurate, but the method possesses a huge computational load. Finding occlusion regions represents a common problem in multi-view 3D projection and display methods. The most recent researched methods in this area are still prone to gross errors when the background or foreground underlying pixel data in these regions is homogeneous or have no texture information.
In Alvarez, et al, “Symmetrical dense optical flow estimation with occlusions detection,” International Journal of Computer Vision 75(3), 371-385 (2007), (hereinafter, Alvarez), passing interest is focused on the role of the diffusion tensor and subsequent eigenvalue analysis, but this is only used to analyze the forward and backward symmetry of the optical flow solution, and not used to directly improve the accuracy of either the optical flow computation nor the occlusion computation.
Ince, S., Konrad, J., “Occlusion-aware optical flow estimation,” IEEE Trans. Image Processing 17(8), 1443-1451 (2008), (hereinafter, “Ince”), discloses a method and systems for joint determination of optical flow and occlusion, but the systems are coupled and this method is not applicable for coupling to a non-optical-flow motion estimation system, such as block matching. Further, Ince ignores the notion of either a diffusion tensor or structure tensor of the images in order to improve robustness.
Motion cues are very important for identifying occlusion regions and boundaries. As described above, the objective of any motion estimation is to compute a flow field that represents the motion of points in two consecutive frames, and the most accurate motion estimation techniques should be able to handle occlusions. Some occlusion detection work based on motion as described in Alvarez and Ince, jointly estimates backward and forward motion and marks inconsistent pixels as occluded regions. In such circumstances, occlusion is detected implicitly and the occlusion detection is coupled with the motion estimation method itself. These methods encounter problems within highly textured imagery areas and do not succeed with large displacements or occlusion regions.
Xiao, et al, “Bilateral Filtering-Based Optical Flow Estimation with Occlusion Detection,” Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 211-224, Springer, Heidelberg (2006) discloses another joint method for computing optical flow and occlusion, but its two computations are closely coupled into a joint regularization framework. Further, this method requires multiple iterations for convergence of the disclosed regularization function and is therefore not suitable for real-time computation for contemporaneous video resolutions such as 1080 and 4K.
Even the best conventional motion estimation methods with coupled occlusion detection systems suffer from two primary disadvantages. First, these methods are too computationally complex for real-time processing. Second, the occlusion region maps they produce are inherently noisy. Pixels marked as occlusions may frequently be false-positives or false-negatives, rendering their usage in subsequent video processing and analysis tasks challenging or impossible.
Accordingly, there is a need for an accurate, precise, low-computational complexity occlusion estimation system and method that in conjunction with a motion estimation system, increases the robustness and accuracy of such a system in the presence of large motions and resulting large occlusion regions.