Today there is a trend to create and deliver richer media experiences to consumers. In order to go beyond the ability of either sample based (video) or model-based (CGI) methods novel representations for digital media are required. One such media representation is SCENE media representation (http://3d-scene.eu). Therefore, tools need to be developed for the generation of such media representations, which provide the capturing of 3D video being seamlessly combined with CGI.
The SCENE media representation will allow the manipulation and delivery of SCENE media to either 2D or 3D platforms, in either linear or interactive form, by enhancing the whole chain of multidimensional media production. Special focus is on spatio-temporal consistent scene representations. The project also evaluates the possibilities for standardizing a SCENE Representation Architecture (SRA).
A fundamental tool used for establishing the SCENE media representation is the deployment of over-segmentation on video. See, for example, R. Achanta et al.: “SLIC Superpixels Compared to State-of-the-Art Superpixel Methods”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43 (2012), pp. 2274-2282. The generated segments, also known as superpixels or patches, help to generate metadata representing a higher abstraction layer, which is beyond pure object detection. Subsequent processing steps applied to the generated superpixels allow the description of objects in the video scene and are thus closely linked to the model-based CGI representation.
A new aspect of the required over-segmentation is a spatio-temporal consistent segmentation. Known approaches on a spatio-temporal consistent over-segmentation are based on graph-cut methods, which have the disadvantage of being costly and time-consuming. See, for example, Z. Tian et al.: “3D Spatio-temporal Graph Cuts for Video Objects Segmentation”, Proceedings of the International Conference on Image Processing (ICIP) (2011), pp. 2393-2396. Newer research on over-segmentation algorithms indicates the SLIC (Simple Linear Iterative Clustering) algorithm described by R. Achanta et al. as a perfect candidate to start with. It combines the advantages of a reliable segmentation result with its ability to be applicable in real-time. As the SLIC method has been originally developed for single image processing further adaptation work is required to cope with image sequences in movies, where a spatio-temporal consistent superpixel representation is essential.
A known solution is the usage of inter frame motion information to provide spatio-temporal consistent superpixels with the SLIC method. Instead of permanently positioning the seed points for the SLIC algorithm at the same location over the whole image sequence, the application of motion information allows a seed point positioning along the motion trajectory estimated from frame to frame. This is described in European Patent Application EP 13171832.2. Applying this seeding strategy generates superpixels which follow the optical flow and thus allows a tracking of moving objects in the scene, which may consist of one or more superpixels. The benefit is that the objects in a movie are constantly assigned to unique superpixels, which makes tracking very simple.