Image segmentation is one of the traditional and important problems in computer vision and image processing. Its potential applications are especially wide, such as medical image analysis and personal photo editing. Fully automated image segmentation is possible yet prone to error, mainly because it is difficult to overcome the gap between local image features (e.g. colors, edges, textures, local histograms) and high-level semantics. Instead, in recent years semi-automated or user-aided image segmentation has attracted increasing interest due to its low requirement and higher accuracy.
There are two general approaches to interactive image segmentation: inductive and transductive. The two approaches differ mainly in how they utilize user guidance. In most inductive approaches, images are assumed to be drawn from certain statistical models whose parameters can be obtained via maximum likelihood or MAP (i.e. maximum a priori) estimation from seed values provided by a user.
Transductive graph-based methods avoid implicit feature modeling via non-parametric label propagation. Typically images are modeled as sparse graphs with 2D lattice topology. Individual pixels or overlapped small patches are treated as graph nodes, while adjacent pixels are connected by an edge in the constructed graph. The graph nodes that correspond to user-provided “seed” pixels or patches are ones that may be regarded as having a high confidence about their labels, and this information is iteratively propagated to remote unlabeled nodes along weighted graph edges.
Transductive, graph-based segmentation has some deficiencies in the areas of graph construction and graph propagation. Current graph construction approaches are based on local comparisons such as L2 distance in RGB color space, which drops all global information. Due to the strength decaying effect for local propagation, estimations for remote nodes far away from user-specified seeds tend to be erroneous.