Resurgent interest in stereo 3D video has fuelled a more general interest in media generated from a combination of sources rather than from a single video source. While the simplest form of stereo 3D capture involves two cameras on a stereo rig or a dual-lens camera, with one view corresponding to a left eye view and the other to a right eye view, the use of additional cameras, such as wing or witness cameras and depth-capture cameras, is increasingly common. Such additional cameras capture a scene from additional viewpoints, providing additional spatial information about a scene. One use of such additional cameras is to supply information that might be missing as a result of occlusion of edge effects when generating stereo 3D output.
In addition to spatial information, different kinds of perceptual information about the same scene can be captured to provide an editor with more media resources for the post-production phase. Such additional information may include an expanded intensity (color) dynamic range, different depth of field ranges, and depth data.
Expanded scene information may be obtained from a single device that captures various kinds of information. Another type of single device may capture expanded ranges of particular types of perceptual information from which specific ranges of information can be extracted later. Alternatively, expanded scene information may be captured via multiple devices, each providing a certain type and/or range of scene information. In all cases, such scene information sources are referred to herein as multi-view sources.
In order to work effectively with multi-view sources, data management and editing functions that are able to handle these sources are needed in post-production and asset management systems. Common data management and editing tasks include copying, cutting, compositing, splicing, deleting, consolidating (identifying media that is used in a sequence), and archiving media that originates from the sources.
The information obtained from multiple camera viewpoints may be used for image-based rendering, in which the various viewpoints are used to generate a 3D geometric model of a scene from which novel views can be synthesized. This can be especially useful when new camera viewing angles of a background scene are needed for compositing virtual or green-screen subjects.
Another use of multiple viewpoints of a scene is to stitch the individual images together to produce high resolution, wide field of view imagery that can be resampled later at a lower resolution or used to extract a partial view. A further application of cameras providing imagery of a given scene from multiple viewpoints is to provide viewers with a choice of views. This can be especially desirable when watching a sports game to offer a choice of long shots, close-ups, and different angles of a game. The OB1 system from the Hego Group of Stockholm, Sweden, is an example of such a live broadcast system.
Plenoptic cameras may be used as a multi-view source that captures four-dimensional light field information about a scene. Examples of plenoptic cameras include the Adobe® Light Field Camera under development by Adobe, Inc., of San Jose, Calif., and the Lytro Light Field Camera announced by Lytro, Inc., of Mountain View, Calif. With such information, the focus range desired can be chosen after data is captured, which allows the reconstruction of arbitrary views using different depth of field ranges.
In medical imaging, sampling a volumetric three-dimensional region of the human body followed by viewing of arbitrary two-dimensional image samples in arbitrary positions and orientations is common in CAT, MRI, and Ultrasound imaging. In addition, various ultrasound views from arbitrary positions and orientations have been resampled into a regular voxel based three-dimensional volumetric image. A volumetric three-dimensional image may be viewed as an example of a light field. In such systems the use of arbitrary views to create a three-dimensional volumetric image requires rich relative position and orientation data, which contrasts with the regular planar sampling deployed by CAT and MRI medical images.
Along a different imaging domain, multi-view sources can also capture scenes with multiple ranges of color exposure values. Such imagery can be combined to create high dynamic range (HDR) imagery. Such sources may be implemented as a multi-camera rig that holds several cameras, each capturing a different intensity range, or by sequential capture of range subsets by a single camera. Other camera systems are able to capture a single, wide range of color exposures using a single device. Such cameras include the Epic® camera from Red Digital Cinema Camera Company of Lake Forest, Calif., and the Alexa camera from Arnold & Richter Cine Technik (ARRI) of Munich, Germany. Another source of HDR images can be products such as Adobe Photoshop® which can be used to merge multiple input images files together to create a single output HDR image file.
In multi-camera setups, or multi-cam, film making or video production utilizes multiple cameras systems capturing the same scene. These cameras are synchronized using the same time reference to ensure that the metadata recorded as part of the captured media from each camera corresponds to a common timecode. Various tools exist to help editors work with these multi-camera sources within the context of non-linear media editing systems, such as Media Composer® from Avid Technology, Inc. of Burlington, Mass., described in part in U.S. Pat. Nos. 5,267,351 and 5,355,450, which are incorporated by reference herein, and Final Cut Pro® from Apple Computer, Inc. of Cupertino Calif. For example, the Multi-Cam, Auto-Sync, and Group Clips tools, which are features of Media Composer, facilitate the management and editing of multiple channel sources. The grouped clip tool uses metadata information to identify the clips that arise from a common scene. The sync-clip editing functions use the temporal metadata associated with the channels to align the clips with respect to each other in the time reference of the editor's timeline view. The Multi-cam takes input channels chosen by the user to provide a single time multiplexed output for a requested time span. The grouping of the temporally aligned clips also allows the editor to perform edits such as trimming and cuts on a clip group. Data management functions such as optimized deletion, consolidation, and archiving can also be performed by identifying the used spans of the various channels in the clip group, and preserving those segments, while leaving the unused portions available for deletion.
As a result of the increasing number of capture devices used to sample a scene, and the amount of perceptual information being captured by these devices, the task of managing and editing a video segment that makes use of some or all of the sources of media becomes increasingly challenging. Tools that handle relationships among the sources, and assist editors in managing, combining, and editing using metadata and multi-channel sources are needed.