The present invention relates to image generation, and in particular to methods and apparatus for generating stereoscopic image pairs based on multi-perspective imaging from a light field.
Three-dimensional (“3D”) television, movies, and games and displays for other purposes have been gaining more and more popularity both within the entertainment industry and among consumers. An ever increasing amount of content is being created, distribution channels including live-broadcast are being developed, and 3D monitors and TV sets are being sold in all major electronic stores.
One approach to 3D imagery is to generate a stereoscopic image pair where one image of the pair is provided to the viewer's left eye and the other image of the pair is provided to the viewer's right eye with something to eliminate or lessen cross-over. Where the images in the image pair are related (e.g., captured from the same scene) but different, those differences are interpreted by the viewer's brain as depth information, thus creating 3D effects. The image pair might be of a physical scene, where light from objects in the scene are captured by a camera or other optical sensor, and the stereoscopic effect is generated by capturing the scene imagery using two cameras offset by some baseline amount. The image pair might be of a virtual scene, such as a scene generated entirely using computer processing and/or geometric models.
Binocular parallax (i.e., binocular disparity) is one cue that stereoscopic image generation systems use for generating stereoscopic scene perception. In stereography, one common method for controlling the amount of binocular parallax is based on setting the baseline, or the inter-axial distance, of two cameras prior to image acquisition. However, the range of admissible baselines is quite limited, since most scenes exhibit more disparity than humans can tolerate when viewing the content on a stereoscopic display. Reducing the baseline of cameras decreases the amount of binocular disparity, but it also causes scene elements to appear overly flat.
Another, more sophisticated, approach to disparity control requires remapping image disparities (or remapping the depth of scene elements) and re-synthesizing new images. However, this approach typically requires accurate disparity computation and hole filling (filling in gaps that appear in the image because scene elements are moved in the re-synthesized views). For computer-generated images, depth remapping of scene elements implies severe changes of lighting, shading, and the scene composition in general.
The computer graphics and computer vision community has studied the geometry and applications of multi-perspective imaging, for example [Wood et al. 1997] describe a computer-assisted method to compute multi-perspective panoramas from a collection of perspective images, and employed multi-perspective imaging in movie production in order to provide a richer and more complete visualization of stereoscopic contents, for example for drawing backgrounds for two-dimensional (“2D”) cell animation [Thomas and Johnston 1995]. In the recent years, many types of multi-perspective cameras and corresponding images have been introduced. Examples include push-broom cameras [Hartley and Gupta 1997] and related Multiple-Center-of-Projection Images [Rademacher and Bishop 1998], cross slit cameras [Pajdla 2002; Zomet et al. 2003], or general linear cameras [Yu and McMillan 2004]. However, creating a richer and perceptually pleasing multi-perspective stereoscopic content without any specific camera model remains a difficult problem.
It has been shown that remapping binocular disparities may be used to refine and optimize stereoscopic content for display on different output devices or according to user preferences. A number of technical approaches for disparity remapping for stereoscopic content have been proposed. [Jones et al. 2001] analyze the scene depth range and adjust the stereoscopic camera baseline to a given disparity budget. Feldman et al. [2003] present a system that uses a nonlinear depth-scaling for transmitting a three-dimensional (3D) scene to be rendered from multiple views. [Holliman 2004] describes a system that compresses the scene depth for stereoscopic displays by identifying a region of interest and compressing it differently compared to the rest of the scene. Koppal et al. [2011] discusses optimal stereo and describes basic post-processing tools with their main focus on shot planning during capture. Ward et al. [2011] proposed a system for 2D-to-3D conversion that relies on image warping and requires manual interaction. [Kim et al. 2008] discuss how to perform non-linear depth remapping for multi-view autostereoscopic displays. [Zwicker et al. 2006] present a remapping and a pre-filtering framework for automultiscopic displays that adapts an in-put light field to the display capabilities.
All these works, however, are restricted in the type of disparity remapping operators they support. In particular, they do not provide a solution for detailed control of disparity in real world images. Although a nonlinear and local disparity remapping to control and retarget the depth of a stereoscopic content has been presented by [Lang et al. 2010], this method is limited in amount of remapping that it applied without producing noticeable distortions of the image content. In particular, this method causes salient scene structures to bend (e.g., straight lines) and fails to allow for per-pixel disparity control.
Accordingly, it is desirable to develop methods and systems to overcome the aforementioned deficiencies to have per-pixel disparity control by selecting actual light rays from an input light field instead of using image deformations or inpainting.