Stereo and multi-view imaging has a long and rich history stretching back to the early days of photography. Stereo cameras employ multiple lenses to capture two images, typically from points of view that are horizontally displaced, to represent the scene from two different points of view. Such multiple images are displayed to a human viewer to let the viewer experience an impression of 3D. The human visual system then merges information from the pair of different images to achieve the perception of depth.
Stereo cameras can come in any number of configurations. For example, a lens and a sensor unit are attached to a port on a traditional single-view digital camera to enable the camera to capture two images from slightly different points of view, as described in U.S. Pat. No. 7,102,686. In this configuration, the lenses and sensors of each unit are similar and enable the interchangeability of parts. Other cameras contain two or more lenses are described, such as in U.S. Patent Application Publication 2008/0218611, where a camera has two lenses and sensors and an improved image (with respect to sharpness, for example) is produced.
In another line of teaching, there are situations where a stereo image (or video) is desired, but only a single-view image (or video) has been captured. This problem is known as 2D-to-3D conversion, and has been addressed in the art. For example, M. Guttmann, L. Wolf, and D. Cohen-Or. Semi-automatic stereo extraction from video footage. In Proceedings of the 2009 IEEE International Conference on Computer Vision, teaches a semi-automatic approach (using user input with scribbles) for converting each image of the video to stereo. In other work, such as D. Hoiem et al, Automatic Photo Pop-up, Proceedings of the 2005 IEEE International Conference on Computer Vision, show that, the 3D geometry of an image is estimated and used to produce images that represent what the scene might look like from another viewpoint.
In another line of teaching, certain frames called keyframes are extracted from a video and used to represent the video. In U.S. Pat. No. 7,643,657, interesting keyframes are selected based on finding shot boundaries and considering other features such as spatial activity and skin detection. However, keyframe extraction does not provide a method for representing a video with a stereo image.