In a traditional 2-dimensional (2D) presentation, one view of content, like video is generated and displayed on a 2D display. Current development goes into direction of 3-dimensional (3D) video presentation, which requires changes in the underlying technology of content generation and processing.
One existing method for the 3D content presentation is based on the stereoscopic technology, wherein two views of a scene are generated, one view for the right and the other view for the left eye. Both views are provided and displayed on a stereoscopic, also called stereo or 3D display. There exist a number of different stereo displays, like for example shutter glasses, polarized glasses, head mounted display, 3D-stereo-projectors. By observing slightly different views of the scene, the human brain integrates these views into a 3D representation.
A multi-view is a further method for 3D content presentation, which is based on capturing the same scene from different view points. The provision of 3D multi-view video comprises different aspects of the multi-view processing, starting from capturing and calibration to encoding, analysis and visualization. This technique provides the ability for users to interactively select a viewpoint of the scene. In case of a 2D display, one view is selected and displayed. In case of a 3D display, two views for the right and for the left eye are selected and displayed. The multi-view video is one example for a multi-view content, in the following also called multi-view session. A further example may be a multi-view image or multi-view TV.
A typical application for interactive multi-view video is a sport event, for example a football game in a stadium, wherein a football scene like penalty or foul can be viewed from different angles and the user has the possibility to change the view.
It is not enough to simply place cameras around the scene to be captured. Many applications like 3DTV, 3D Video and Free Viewpoint Video require a synchronized and calibrated capturing of multi-view video sequences. The geometry of the camera set up is measured by a process known as camera calibration, which synchronizes the cameras in space and time. Multi-view video capture varies from partial to complete coverage of the scene. Thus depending on the coverage a corresponding number of synchronized and calibrated cameras placed around the scene is required.
Conventional multi-view video capturing techniques follow a hardware-based approach. One example of the hardware-based approach is described in “Interactive Multiview Video Delivery Based on IP Multicast”, Jian-Guang Lou, Hua Cai, and Jing Li, Advances in Multimedia Volume 2007 (2007). Herein a number of a priori calibrated video cameras are placed around a scene. All the cameras are specialized for capturing a multi-view scene and connected to a number of synchronization units which are further connected to a number of control PCs controlling a simultaneous operation. As a result, an event is captured from different point of views. Once the event is captured, the video signals are compressed in the control PCs and sent to the server through the network the control PCs are connected to. Thus this kind of solution has usually a backbone architecture for synchronizing the cameras and for processing the captured session. Consequently the realizations of the hardware based approaches are technically complex.
Another work (“Synchronous Image Acquisition based on Network Synchronization” by G. Litos, X. Zabulis, G. Triantafyllidis) describes a software-based solution for camera synchronization. Herein a cluster of computer clients with one or more attached cameras is described wherein the computers are connected by means of a network. The synchronization of computer clocks is done by means of the Network Time Protocol (NTP). Also this solution requires installation of complex backbone computer architecture around the captured scene.
US 2006/0001744 A1 proposes to connect a cluster of portable devices, like mobile phone, PDA, digital camera by means of an ad-hoc network, such as Bluetooth™. In this architectural arrangement once the ad-hoc network is formed, one of the devices is declared to be a leader device. The leader device sends a capture message to the other devices for initiating capturing of an image. Upon receipt of the message, the capture buttons of the other devices are triggered to capture an image essentially at the same time. Afterwards the devices send the taken pictures to the leader device for consolidation. Thus, this document deals with the necessity of synchronizing the time when capturing an image.
Based thereon it is to be concluded, that the existing solutions are often costly, technically complex and not flexible enough.
Accordingly, there is a need for synchronizing cameras capturing a multi-view session. In particular there is a need to provide a flexible and efficient architecture for providing a synchronized multi-view video.