Although the human visual system can naturally capture the three dimensional world, most imaging and display systems limit our abilities by presenting only a two dimensional ("2D") mapping of the three dimensional ("3D") world. Many current as well as emerging applications can significantly benefit from the high degree of realism provided by 3D scenes with depth. In fact, human ability to perceive realism in a scene is directly related to our ability to perceive depth accurately in the scene. A real 3D scene can be reconstructed from many 2D views of a scene imaged from different perspectives. Such a representation would not only allow the perception of depth in a scene but look-around capability as well. However, due to practical display constraints, although only a less complex solution employing two views of a scene is easily realizable, it can still impart the sensation of depth. Thus, two views are required to be imaged under specific constraints--one for each eye of a human visual system--so that our brain can generate the depth information necessary to perceive realism. The two views, when put together in a video format, can represent stereoscopic video. Each view is similar to normal video except that the two views are related under the constraints imposed by stereoscopic vision. Under specified constraints, the two views imaging a scene differ by what is known as disparity between the views, which is typically only a few pixels in vertical direction but can be of the order of about 40 pixels or higher in the horizontal direction, assuming each view is imaged at normal TV resolution.
Stereoscopic video has potential applications in education, training, 3D movies/entertainment, medical surgery, videoconferencing, virtual travel and shopping, multimedia presentations, video games and immersive virtual reality experiences, and others. Although, many potential applications of 3D/stereoscopic video exist, there are several limitations to be overcome before 3D/stereoscopic video potential can be truly harnessed, and its use becomes wide spread. One example of such a limitation is that a practical means of displaying stereo requires viewers to use specialized viewing glasses. Although some displays do not require specialized viewing glasses, for example, autostereoscopic systems, they impose other restrictions, for example, limited viewing zones and view discreteness. Moreover, such systems may typically require between 10 and 20 views for realism. Stereoscopic video, on the other hand, although it requires use of specialized glasses, can impart perception of depth in a scene and requires only two views: one is referred to as the left-view and other is referred to as the right-view, which are intended for presentation to the left-eye and the right-eye, respectively, of a human visual system in either time-sequential (with active synchronized shuttered glasses) or time-simultaneous (with passive polarizing glasses).
In addition to the aforementioned display issue, another issue of concern is efficient digital compression of 3D/stereoscopic video so that the multiple views can be easily manipulated, stored or transmitted as needed. Towards that end, interworking with existing or emerging standards based coding schemes as well as existing displays for normal video is highly desirable.