This invention relates generally to computer vision and more particularly to minimizing energy functions in labeling pixels in early vision.
In computer vision, early vision is usually considered to involve the description of geometric structure in an image or sequence of images. The behavioral constraints on real-time visual systems typically require that the early vision stage be fast, reliable, general and automatic. Many early vision problems require estimating some spatially varying quantity such as intensity or disparity, from noisy measurements.
Spatially varying quantities tend to be piecewise smooth, i.e. they vary smoothly at most points, but change dramatically at object boundaries. Every pixel in a set P must be assigned a label in some set L. For motion or stereo, the labels are disparities. For image restoration, the labels represent intensities. The goal is to find a labeling f that assigns each pixel p∈P a label fp∈L, where f is both piecewise smooth and consistent with observed data.
Computer vision problems can be formulated in terms of minimization of energy functions. Energy functions, however, are generally difficult to minimize. The major difficulty with energy minimization for computer vision lies in the enormous computational costs. Energy functions for computer vision typically have many local minima. Also, the space of possible labelings has the dimension |P| which is many thousands.
There have been many attempts to design fast algorithms for energy minimization. Due to the inefficiency of computing a global minimum, some solutions are directed instead at computing a local minimum. In general, a local minimum can be arbitrarily far from the optimum. It thus may not convey any of the global image properties that are encoded in the energy function. It is, however, difficult to determine the exact cause of an algorithmi""s failures. When an algorithm gives unsatisfactory results, it may be due to either a poor choice of the energy function, or due to the fact that the answer is far form the global minimum. Local minimization techniques are also naturally sensitive to an initial estimate.
In general, a labeling f is a local minimum of the energy E if
E(f)xe2x89xa6E(fxe2x80x2) for any fxe2x80x2 xe2x80x9cnear toxe2x80x9d f.xe2x80x83xe2x80x83(1)
In case of discrete labeling, the labelings near to f are those that lie within a single move of f. Many local optimization techniques use standard moves, where only one pixel can change its label at a time. For standard moves, the equation above can be read as follows: if you are at a local minimum with respect to standard moves, then you cannot decrease the energy by changing a single pixel""s label. In fact, this is a very weak condition. As a result, optimization schemes using standard moves frequently generate low quality solutions.
An example of a local method using standard moves is Iterated Conditional Modes (ICM). For each site (pixel or voxel), the label which gives the largest decrease of the energy function is chosen, until the iteration converges to a local minimum.
Another example of an algorithm using standard moves is simulated annealing. Simulated annealing randomly proposes some change in the state of the system. If the change results in a decrease of energy (which is equivalent to a decrease in cost in the more general sense of optimization), the change will always be taken. If it results in an increase in energy, it will be chosen using a probability scheme. At high temperatures (i.e. early in the simulated annealing process), changes of state that result in large increases in energy will be accepted with a higher probability than they would be at low temperatures (late in the simulated annealing process). Simulated annealing is widely used because it can optimize an arbitrary energy function. Minimizing an arbitrary energy function requires exponential time, and as a consequence simulated annealing is very slow.
Simulated annealing is inefficient partly because at each step, annealing changes the value of a single pixel. Theoretically, simulated annealing should eventually find the global minimum if it is run long enough. As a practical matter, it is necessary to decrease the algorithm""s temperature parameter faster than required by the theoretically optimal schedule. Once annealing""s temperature parameter is sufficiently low, the algorithm will converge to a local minimum with respect to standard moves.
An alternative method is to seek a local minimum using variational techniques, for example. Variational methods can be applied if the energy minimization problem is phrased in continuous terms. Variational techniques use Euler equations, which are guaranteed to hold at a local minimum. In continuous cases, the labels near to f in the equation above are normally defined as ∥fxe2x88x92fxe2x80x2∥xe2x89xa6xcex5 where xcex5 is a positive, constant and ∥xc2x7∥ is a norm, e.g. L2 over some appropriate functional space. To apply these algorithms to actual imagery requires discretization.
Another alternative is to use discrete relaxation labeling methods. In relaxation labeling, combinatorial optimization is converted into continuous optimization with linear constraints. Then some form of gradient descent, which gives the solution satisfying the constraints, is used.
There are also methods that have optimality guarantees in certain cases. Continuation methods, such as graduated non-convexity are an example. These methods involve approximating an intractable (non-convex) energy function by a sequence of energy functions beginning with a tractable (convex) approximation. There are circumstances where these methods are known to compute the optimal solution. Continuation methods can be applied to a large number of energy functions, but except for special cases, nothing is known about the quality of their output.
Mean field annealing is another popular minimization approach. It is based on estimating the partition function from which the minimum of the energy can be deduced. Computing the partition function, however, is computationally intractable, and saddle point approximations are used.
There are a few interesting energy functions where the global minimum can be rapidly computed via dynamic programming. Dynamic programming, however, is restricted essentially to energy functions in one-dimensional settings. In general, the two-dimensional energy functions that arise in early vision cannot be solved efficiently via dynamic programming.
It is an object of the present invention to provide a method and apparatus to improve the early vision stage of computer vision.
It is another object of the present invention to provide a method and apparatus to improve the speed while maintaining accuracy of minimizing energy functions.
The problems of early vision and minimizing energy functions are solved by the present invention of a method and apparatus for fast approximate energy minimization via graph cuts.
Many tasks in computer vision involve assigning a label, such as disparity for depth of field data to every pixel. Energy minimization may be used to accomplish this labeling. The present invention provides an efficient way of minimizing energy functions in assigning labels to pixels.
The major restriction is that the energy function""s smoothness term must involve only pairs of pixels. Two methods of using graph cuts to compute a local minimum are described. They may be used even when very large moves are allowed.
The first move is an xcex1-xcex2 swap. For a pair of labels, xcex1, xcex2, this move swaps the labels between an arbitrary set of pixels labeled a and another arbitrary set of pixels labeled xcex2.
The first method generates a labeling such that there is no swap move that decreases the energy.
The second move is the xcex1-expansion. For a label xcex1, this move assigns an arbitrary set of pixels with the label xcex1.
The second method which requires the smoothnes term to be a metric, generates a labeling such that there is no expansion move that decreases the energy.