Today there is a trend to create and deliver richer media experiences to consumers. In order to go beyond the ability of either sample based (video) or model-based (CGI) methods novel representations for digital media are required. One such media representation is SCENE media representation (http://3d-scene.eu). Therefore, tools need to be developed for the generation of such media representations, which provide the capturing of 3D video being seamlessly combined with CGI.
The SCENE media representation will allow the manipulation and delivery of SCENE media to either 2D or 3D platforms, in either linear or interactive form, by enhancing the whole chain of multidimensional media production. Special focus is on spatio-temporal consistent scene representations. The project also evaluates the possibilities for standardizing a SCENE Representation Architecture (SRA).
A fundamental tool used for establishing the SCENE media representation is the deployment of over-segmentation on video. See, for example, R. Achanta et al.: “SLIC Superpixels Compared to State-of-the-Art Superpixel Methods”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43 (2012), pp. 2274-2282. The generated segments, also known as superpixels or patches, help to generate metadata representing a higher abstraction layer, which is beyond pure object detection. Subsequent processing steps applied to the generated superpixels allow the description of objects in the video scene and are thus closely linked to the model-based CGI representation.
A novel application evolving from the availability of superpixels is the generation of superpixel clusters by creating a higher abstraction layer representing a patch-based object description in the scene. The process for the superpixel cluster generation requires an analysis of different superpixel connectivity attributes. These attributes can be, for example, color similarity, depth/disparity similarity, and the temporal consistency of superpixels. The cluster generation usually is done semi-automatically, meaning that an operator selects a single initial superpixel in the scene to start with, while the cluster is generated automatically.
A well-known clustering method for image segmentation is based on color analysis. The color similarity of different picture areas is qualified with a color distance and is used to decide for a cluster inclusion or exclusion of a candidate area. A typical color distance measure compares the color histograms generated for each superpixel. However, for the color based clustering method the cluster growth and, therefore, the final superpixel cluster extent is highly dependent on the initially selected superpixel. The color data of the initially selected superpixel has the exclusive control on the clustering process, as all distance measures are related to it. Therefore, the resulting cluster shapes are highly dependent on the initially selected superpixel and show large variances.
Furthermore, the color based clustering has a tendency of providing a low significance inherent to the color information given with a single first selected superpixel. The color information available for the very first selected superpixel often does only roughly represent the required data. Thus the propagation of the superpixel cluster is accordingly limited and does often exclude relevant superpixels from becoming members of the cluster. However, a mitigation of the threshold controlling the cluster joining is not advisable, as it will not help to overcome the described weakness. A threshold mitigation does often lead to the problem that also unwanted superpixels are joined to the cluster.