1. Field of the Invention
The present invention relates in general to the techniques for capturing visual contents like images and videos, and more particularly the invention relates to the capturing of visual contents in a way adapted to render a three-dimensional (3D) effect from multiple viewpoints.
2. Discussion of the Related Art
3D visual contents generation and fruition is a promising field of research which is expected to find interesting applications in several fields, like for example making it possible to offer a more true-to-reality experience in inter-personal communications (3D videocommunications/videoconferencing) and new multimedia contents distribution services (e.g., 3D animation).
In the past decade, different approaches and techniques have been proposed, some of which have also been standardized.
However, up to now no solution is available for implementing a complete, end-to-end system at a reasonable cost for the visual contents producer, the contents distributor and the end user.
Typically, a videocommunication system, or, more generally, a system for the distribution of 3D visual contents is made up of an acquisition subsystem, an encoding and distribution subsystem and a display subsystem.
Known techniques for capturing 3D videos from multiple viewpoints exploit an array of videocameras located at different, spaced-apart positions and orientations; a 3D model or depth model of the captured scene can be derived by the different video captures.
Several solutions have been proposed for generating “depth maps” (i.e., maps of the distance of the different points of a captured scene as seen from an observation point) starting from two bidimensional (2D) video captures, captured by two 2D videocameras positioned according to the human stereoscopic view (i.e., emulating the right and left eyes), or starting from generic arrangements of multiple 2D videocameras.
More recently, videocameras have been made available that are capable of acquiring, in real time, and in addition to a bidimensional (2D) view of the scene, information about the scene depth (intended as the distance of the various points of the scene from the videocamera). These “depth cams” exploit techniques based on a measure of the time of flight of laser beams or InfraRed (IR) pulses. An example of videocamera capable of measuring objects distances is for example described in WO 97/01113,
US 2007/296721 discloses a contents generating method and apparatus that can support functions of moving object substitution, depth-based object insertion, background image substitution, and view offering upon a user request and provide realistic image by applying lighting information applied to a real image to computer graphics object when a real image is composited with computer graphics object. The apparatus includes: a preprocessing block, a camera calibration block, a scene model generating block, an object extracting/tracing block, a real image/computer graphics object compositing block, an image generating block, and the user interface block.
WO 2008/53417 discloses a system for producing a depth map of a video sequence comprising a client and a server connected by a network. A secondary video sequence available at the client is derived from a primary video sequence available at the server, the primary video sequence having a primary depth map. The server comprises a transmission unit for transmitting the primary depth map to the client. The client comprises an alignment unit for aligning the primary depth map with the secondary video sequence so as to produce alignment information, and a derivation unit for deriving the secondary depth map from the primary depth map using the alignment information.