Media item segmentation techniques, such as image segmentation techniques, either try to segment a media item in a semantically meaningful way, or go to the other extreme by creating small clusters of similar pixels or voxels of roughly equal size, called superpixels or supervoxels respectively. The former type of technique rarely succeeds in bridging the semantic gap, while the latter has traditionally been agnostic to even basic properties such as object scale and texture. The following description focuses on image segmentation for simplicity, but can be applied to other media types, such as video or multidimensional scan data.
Image segmentation continues to be a challenge that attracts both domain specific and generic solutions. To avoid the struggle with semantics when using traditional segmentation algorithms, researchers have lately diverted their attention to a much simpler and more achievable task, namely that of simplifying an image into small clusters of connected and contiguous media elements or pixels. Such clusters are called superpixels, and they have quickly become a potent preprocessing tool for simplifying an image from potentially millions of pixels, to about two orders of magnitude fewer clusters of similar pixels. After their introduction, superpixels quickly found their way into a wide-range of computer vision applications such as body model estimation, multi-class segmentation, depth estimation, object localization, optical flow and tracking. For these applications, superpixels are commonly expected to have the following properties:                Tight region boundary adherence (the superpixels do not extend beyond object boundaries).        Containing a small cluster of similar pixels.        Uniformity; roughly equally sized clusters.        Compactness; thereby limiting the degree of adjacency (compactness can be understood as having smoother or less noisy boundaries as opposed to wiggly superpixel boundaries).        Computational efficiency.        
When the size of a superpixel is chosen for an application, a strong assumption is made regarding the minimum scale to be preserved. Structures smaller than the superpixel size are sacrificed for the sake of simplifying the image. This may diminish the quality of the output for certain applications that require fine details. At the same time, textureless regions may contain more superpixels than necessary, thus defeating the goal of simplifying the image.
It is an object of the present invention to overcome the problems identified above related to media item segmentation techniques.