Methods of this type for processing video data sets are used inter alia for creating ‘multiple view videos from monocular videos. Video sequences of this type can be used, for example, in conjunction with 3-D displays or autostereoscopic displays in order to convey to the observer an impression of depth in the image being observed. The method for processing video data sets effectively transforms video sequences for two-dimensional imaging into video sequences for three-dimensional imaging. A variety of methods have been proposed for this purpose. The existing methods can be roughly divided into methods for generating a complete 3-D model for the scene captured in the image (Hartley et al., “Multiple view geometry”, Cambridge University Press, UK, 2003; Pollefeys: “Tutorial on 3D modeling from images”, European Conf. on Computer Vision (ECCV), 2000; Tomasi et al., Journal of Computer Vision 9(2), pp. 137-154, 1992; Knorr et al., “A modular scheme for 2D/3D conversion of TV broadcast”, 3rd Int. Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT), Chapel Hill, USA, 2006) and methods wherein a stereoscopic representation is generated, either by means of calculations of planar transformations (see Rotem et al., Proc. of the SPIE: Stereoscopic Displays and Virtual Reality Systems XII, vol. 5664, pp. 198-206, March 2005; WO 02/091754) or with the aid of a depth analysis for each image in the video sequence, wherein DIBR technology (DIBR—‘Depth-Image-Base-Rendering’); (K. Moustakas et al., IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, No. 8, pp. 1065-1073, August 2005; K. T. Kim et al., “Synthesis of a high-resolution 3D stereoscopic image pair from a high-resolution monoscopic image and a low-resolution depth map”, Proc. of the SPIE: Stereoscopic Displays and Applications IX, San José, USA, 1998; C. Fehn, “Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3D-TV”, Proc. of the SPIE: Stereoscopic Displays and Virtual Reality Systems XI, San José, USA, 2004; L. Zhang et al., “Stereoscopic image generation based on depth images”, IEEE Int. Conf. on Image Processing (ICIP), Singapore, 2004; WO 2005/013623) is used.
The methods which attempt to create a complete 3-D model of a recorded scene include SfM analysis or SfM technology (SfM—‘Structure from Motion’); (Pollefeys: “Tutorial on 3D modeling from images”, European Conf. on Computer Vision (ECCV), 2000). With the aid of SCM analysis, in a freely selectable coordinate system, the spatial coordinates of the recording device, for example, a camera used for recording the images of the video sequence are determined. At the same time, using this analysis, in the chosen coordinate system the spatial coordinates of reference image points in the 2-D images of the existing video sequence can be calculated. However, SfM technology cannot supply dense and exact 3-D modelling, although this is required for the generation of stereoscopic images in high quality. The DIBR method requires dense depth estimation, which is time-consuming and error-laden.
In the document “The ORIGAMI Project: Advanced tools for creating and mixing real and virtual content in film and TV production”, in: IEE Proceedings—Vision, Image and Signal Processing, August 2005, vol. 152, No. 4, pp. 454-469, ISSN: 1350-245X by O. Grau, R. Koch, F. Lavagetto, A. Sarti, S. Tubaro and J. Woetzel, a method for processing a video data set is described wherein virtual images derived from the original images are added to the original images of the video data set, wherein an SfM analysis is carried out, at least for the “environment portion” of the original images, by which means the initial position of a recording device used to record the original images is determined.
In the document “Video Synthesis at Tennis Player Viewpoint from Multiple View Videos”, in: IEEE Proceedings—Virtual Reality 2005, March 2005, pp. 281-282, ISSN: 1087-8270, ISBN: 0-7803-8929-8 by K. Kimora and H. Saito, a method for generating virtual views of a tennis game is described. In this document virtual images are derived from original images, wherein an original image is assigned to a virtual image based on corresponding points in the original image and a homography is determined for a virtual initial image and the virtual final image is generated using this homography on the original image. With the known method, an SfM analysis of the initial images and a respective associated original position of a recording device used to record the original images is carried out.