Superpixel algorithms represent a very useful and increasingly popular preprocessing step for a wide range of computer vision applications, such as video segmentation, tracking, multi-view object segmentation, scene flow, 3D layout estimation of indoor scenes, interactive scene modeling, image parsing, and semantic segmentation. Grouping similar pixels into so called superpixels leads to a major reduction of the image primitives. This results in an increased computational efficiency for subsequent processing steps, allows for more complex algorithms computationally infeasible on pixel level, and creates a spatial support for region-based features.
Superpixel algorithms group pixels into superpixels. As indicated in [1], superpixels are local, coherent, and preserve most of the structure necessary for segmentation at the scale of interest. Superpixels should be roughly homogeneous in size and shape. Though many superpixel approaches mostly target still images and thus provide only a limited or no temporal consistency at all when applied on video sequences, some approaches target video sequences [2][3]. These approaches start to deal with the issue of temporal consistency.
The superpixel generation in itself does not necessarily lead to spatially coherent superpixels. Thus, a post-processing step is required to ensure the spatial connectivity of the pixels comprised in the clusters and thus the superpixels. In addition, in [4] it was stated that the post-processing method proposed in [5] assigns the isolated superpixel fragments to arbitrary neighboring superpixels without considering any similarity measure between the fragments and the superpixels they are assigned to. Contour evolution approaches as proposed in [4] can overcome this drawback, often at the cost of a high number of iterations. In addition, they often focus on still images and thus leave the temporal consistency issue unsolved.