Segmentation is a key processing step in many applications, ranging for instance from medical imaging to machine vision and video compression technology. Although different approaches to segmentation have been proposed, those based on graphs have attracted lot of researchers because of their computational efficiency.
Many segmentation algorithms are known to the practitioners in the field, today. Some examples include the watershed algorithm, and SLIC, a superpixel algorithm based on nearest neighbor aggregation. Typically, these algorithms have a common disadvantage in that they require a scale parameter to be set by a human supervisor. Thus, the practical applications have, in general, involved supervised segmentation. This may limit the range of applications, since in many instances segmentation is to be generated dynamically and there may be no time or opportunity for human supervision.
In embodiments, a graph-based segmentation algorithm based on the work of P. F. Felzenszwalb and D. P. Huttenlocher is used. They discussed basic principles of segmentation in general and applied these principles to develop an efficient segmentation algorithm based on graph cutting in their paper “Efficient Graph-Based Image Segmentation,” Int. Jour. Comp. Vis., 59(2), September 2004, herein incorporated by reference in its entirety. Felzenszwalb and Huttenlocher stated that any segmentation algorithm should “capture perceptually important groupings or regions, which often reflect global aspects of the image.”
Based on the principle of a graph-based approach to segmentation, Felzenszwalk and Huttenlocher first build an undirected graph G=(V, E) where v1εV is the set of pixels of the image that has to be segmented and (vi, vj)εE is the set of edges that connects pairs of neighboring pixels; a non-negative weight w(vi, vj) is associated to each edge with a magnitude proportional to the difference between vi and vj. Image segmentation is identified by finding a partition of V such that each component is connected, the internal difference between the elements of each component is minimal whereas the difference between elements of different components is maximal. This is achieved by the definition of a predicate in Equation (1) that determines if a boundary exists between two adjacent components C1 and C2, that is:
                              D          ⁡                      (                                          C                1                            ,                              C                2                                      )                          =                  {                                                    true                                                                                  if                    ⁢                                                                                  ⁢                                          Dif                      ⁡                                              (                                                                              C                            1                                                    ,                                                      C                            2                                                                          )                                                                              >                                      MInt                    ⁡                                          (                                                                        C                          1                                                ,                                                  C                          2                                                                    )                                                                                                                          false                                            otherwise                                                                        (        1        )            where Dif(C1, C2) is the difference between the two components, defined as the minimum weight of the set of edges that connects C1 and C2; MInt(C1, C2) is the minimum internal difference, defined in Equation (2) as:MInt(C1,C2=min[Int(C1)+τ(C1),Int(C2)+τ(C2)]  (2)where Int(C) is the largest weight in the minimum spanning tree of the component C and describes therefore the internal difference between the elements of C; and where τ(C)=k/|C| is a threshold function used to establish whether there is evidence for a boundary between two components. The threshold function forces two small segments not to fuse at least there if is a strong evidence of difference between them.
In practice, the segment parameter k sets the scale of observation. Although Felzenszwalb and Huttenlocher demonstrate that the algorithm generates a segmentation map that is neither too fine nor too coarse, but the definition of fineness and coarseness finally depends on k that has to be carefully set by the user to obtain a perceptually reasonable segmentation.
The definition of the proper value of k for the graph-based algorithm, as well as the choice of the threshold value used for edge extraction in other edge-based segmentation algorithms such as, for example, the algorithms described by Iannizzotto and Vita in “Fast and Accurate Edge-Based Segmentation with No Contour Smoothing in 2-D Real Images,” Giancarlo Iannizzotto and Lorenzo Vita, IEEE Transactions on Image Processing, Vol. 9, No. 7, pp. 1232-1237 (July 2000), the entirety of which is hereby incorporated by reference herein for all purposes, remains up to now an open issue when “perceptually important groupings or regions” have to be extracted from the image. In the algorithm described by Iannizzotto and Vita, edges are detected by looking at gray-scale gradient maxima with gradient magnitudes above a threshold value. For this algorithm, k is this threshold value and needs to be set appropriately for proper segmentation. In embodiments, segmentation based on edge-extraction may be used. In those embodiments, edge thresholds are established based on a strength parameter k. In the field of segmentation algorithms, in general, a parameter is used to set the scale of observation. In cases in which segmentation is performed in a supervised mode, a human user selects the k value for a particular image. It is however clear that the segmentation quality provided by a certain algorithm is generally related to the quality perceived by a human observer, especially for applications (like video compression) where a human being does constitute the final beneficiary of the output of the algorithm.
For example, a 640×480 color image is provided in FIG. 1A. A graph cut algorithm was used to generate the segmentation results associated with the image of FIG. 1A as discussed herein. Segmentation maps with σ=0.5, and a min size of 5 of the image of FIG. 1A are provided in FIGS. 1B-1D for various values of k. In FIG. 1B, k is 3, in FIG. 1C, k is 100, and in FIG. 1D, k is 10,000. As illustrated in FIG. 1B, values of k too small may lead to over-segmentation. As illustrated in FIG. 1D, large values of k may introduce under-segmentation.