In image processing and computer vision applications, figure-ground segregation is an important processing step that determines to a large extent the quality of many post processing stages. These latter include e.g. object detection, object classification, the accurate estimation of object properties, the determination of object boundaries and object overlapping, etc. If the figure-ground segregation is not sufficiently accurate, object classification may fail because parts of an image are interpreted as belonging to an object even if they do not.
In dynamic scenes where objects move, transform and change its appearance, the problem is not only to determine accurately which parts of the image belong to an object, but also how these parts change over time. The challenge is to keep the sub-regions of an image that are identified as “figure” as accurately as possible on the object, despite an object's considerable changes in position, geometrical appearance, color, illumination and reflection, to name only a few.
One approach is to give a figure-ground segregation system hints on how the object's appearance is actually changing, measured via some other visual channels. Many current systems for figure-ground segregation do either a) not take advantage of these hints from other channels or b) use only very simple and limited hints (like the assumed translation of an object).
In computer vision, figure-ground segregation is the process of taking an image and separating parts or sub-regions of an image which correspond to an object of interest from those parts that do not. In short, each image pixel position is assigned a two-valued label with e.g. values 1 or 0, with 1 indicating the object and 0 the background. The result is a binary mask with the same size as the original image, which can be used for other computer vision processes applied to the image to cut out the effects produced by the background, improving all processes related to extracting specific information about the object.
Figure-ground segregation is a very well studied area of computer vision. It usually consists of identifying local properties or image features that are specific for the object region and that make this region distinguishable from the rest of the image (the ground). As an example, such features can be a particular type of color or a texturization pattern, but also indirectly processed features like gradient magnitude (e.g., the degree of change of a feature) or dynamic cues (e.g., the coherence of motion), or even a given model of an objects appearance.
Current algorithms for figure-ground segregation then take the characteristic image features that identify an object and compare them at all image positions with the features that are measured at the currently available image. The comparison results in a score that determines the degree of “figure” at each position. The task is then to find the regions for which the score is maximal. One way to do this is by setting up a mathematical functional (an integral that depends on a function that describes the region that characterizes the “figure” part) that integrates the “figure” scores over all positions of a figure region, which can then be studied by varying the figure region. The figure-ground segregation which best characterizes the current image is then gained by a maximization of the overall “figure” score functional. Maximization techniques of such functionals lead to development equations which describe the local change of the “figure” part function in order to fulfill a gradient ascent into the direction of a maximal functional score. The energy functional can be extended with additional constraints on the region borders, e.g. taking into consideration curvature length and smoothness. Prominent representatives for figure-ground segregation algorithms of this kind are so-called two-region level-set methods for image segmentation. See Osher, S., Sethian, J. A.: Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79 (1988) 12-49 and Rousson, M., Deriche, R.: A variational framework for active and adaptive segmentation of vector valued images. IEEE Workshop on Motion and Video Computing, Orlando, Fla. (December 2002) which are incorporated by reference herein in their entirety.
A second well-studied area of computer vision is optical motion estimation. Here the local deformations between two image frames are estimated, i.e., the goal is to find correspondences between local patches so that one can say how the patches of the first image have potentially moved to constitute the patches of the second image. To estimate the patch correspondences, a general assumption about the patch appearance change over time, and in particular from the first to the second image, is required. A standard assumption is that the appearance of the patches does not change, so that we search the same patches from the first in the second image, only at different positions, or that they change their appearance slowly and continuously.
Motion estimation is inherently ambiguous as a result of the nature of its signals (local patches which, for example, lead to the motion aperture problem). This means that the estimations result in multiple possible motions of a patch, which can be expressed either fully probabilistically or in a reduced form, by indicating a confidence for possible motions. See Weiss, Y., Fleet, D.: Velocity likelihoods in biological and machine vision. In: Probabilistic Models of the Brain: Perception and Neural Function, MIT Press (2002) 77-96 and Willert, V., Eggert, J., Adamy, J., Koerner, E.: Non-gaussian velocity distributions integrated over space, time, and scales. In: IEEE Trans. Syst., Man, Cybern. B. Volume 36. (June 2006) 482-493 which are incorporated by reference herein in their entirety.
Optical flow algorithms constitute an approximation of the motion between two images, in the sense that they provide a vector field defined across the image plane, so that each pixel of, for example, the first image has attached a corresponding flow vector that indicates where a local image patch around this pixel has moved to when we go to the second image. Multiple motion hypotheses are therefore neglected in these algorithms, only a single confidence for the motion vector is passed for each position. Examples of standard optical flow algorithms are described in Horn, B. K. P., Schunck, B. G.: Determining optical flow. Artif. Intel. 17(1-3) (1981) 185-203, Beauchemin, S. S., Barron, J. L.: The computation of optical flow. ACM Comp. Surv. 27(3) (1995) 433-467, and Singh, A.: An estimation-theoretic framework for image-flow computation. In: 3rd IEEE ICCV. (1990) 168-177 which are all incorporated by reference herein in their entirety.
A third area from computer vision that is related to this invention is object tracking. Object tracking refers to locating an object in a sequence of consecutive images and constitutes an elementary task in high level video analysis. In Yilmaz, A., Javed, 0., Shah, M.: Object tracking: A survey. ACM Comput. Surv. 38(4) (2006) 13, which is incorporated by reference herein in its entirety, a comprehensive survey of object tracking algorithms is given. Depending on the vision task, object tracking algorithms are based on several object representations (e.g. single point; rectangular, elliptical and part-based multiple patches; object contour and silhouette), object detection strategies (e.g. point detectors, background subtraction and image segmentation) and prediction methods for the object location (e.g. probabilistic and deterministic, parametric and non-parametric models). Non-rigid object deformation (e.g. a walking person), complex and rapid object movements (e.g. playing children), entire object appearance changes (e.g. front side vs. back side) and object occlusions form some of the numerous challenges in the field of object tracking.
It is therefore an object of the invention to provide an improved method and device for figure-ground segregation.