In the field of computer vision, discrete optimization using maximum flow algorithms has become very popular. This has been driven by the fact that many problems, such as image segmentation, stereo matching or shape matching, are formulated using probabilistic models like Markov random fields (MRF) or conditional random fields (CRF). The computation of the maximum a posteriori (MAP) solution for these models can be regarded as the discrete minimization of an energy function. Many algorithms in literature are able to efficiently compute an approximate solution of the given optimization problem. Under some assumptions, e.g. that the energy function is submodular, these algorithms are able to compute the exact minimum of the given energy function.
Research on solving discrete optimization problems using maximum flow/minimum cut algorithms for applications in computer vision can be divided into the following approaches:
Augmenting Paths:
For computer vision problems, the most widely used algorithm is the Boykov and Kolmogorov augmenting paths algorithm (BK-algorithm). This algorithm efficiently solves moderately sized 2D and 3D problems with low connectivity.
Push-Re-Label:
Most parallelized maximum flow/minimum cut algorithms are based on the push-re-label scheme. These methods outperform the traditional BK-algorithm for huge and highly connected grid graphs. Special hardware is used to approximate the optimal solution.
Grouping of Variables/Graph Sparsification:
Besides the approaches to develop more efficient algorithms for the maximum flow/minimum cut problem, researchers are also trying to reduce the size of the labeling problem or the graph itself. One simple and widely used technique merges variables in the energy function into a smaller number of groups, e.g. superpixels. For example, in B. Scheuermann et al.: “Slimcuts: Graphcuts for high resolution images using graph reduction”, 8th International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition (EMMCVPR) (2011), pp. 219-232, an algorithm for graph sparsification is presented that does not change the optimal solution. The idea is to create a so called Slim Graph by merging nodes in the graph that do not change the maximum flow, meaning that these variables are guaranteed to have the same label in the minimum energy state.
Multi-Scale:
The idea of multi-scale methods for image labeling is to first solve the problem at low resolution using standard techniques. This can be interpreted as a grouping of the image into regular non-overlapping groups. The result of the low-resolution labeling is refined at the high-resolution in a following optimization step where most variables of the problem are fixed.
Unfortunately, in parallel to the improvement of discrete energy minimization algorithms, the size of single images and image sequences has increased significantly. Compared to standard benchmark images, which have an approximate size of 120.000 pixels, nowadays commercial cameras capture images with many more pixels, e.g. up to 20 million. Since most energy functions contain one discrete variable per pixel, e.g. energy functions for image segmentation or stereo matching, the minimization using maximum flow algorithms can be computationally extremely expensive. It has been shown that the given algorithms are not applicable if the data of the problem does not fit into the physical memory. Though more efficient energy minimization methods developed, the computational cost and memory requirements of these methods are still highly linear in the number of variables and terms of the energy function.