Although the human visual system can naturally capture the three dimensional world, most imaging and display systems limit our abilities by presenting only a two dimensional (2D) mapping of the three dimensional (3D) world. Many current as well as emerging applications can significantly benefit from the high degree of realism provided by 3D scenes with depth. In fact, human ability to perceive realism in a scene is directly related to our ability to perceive depth accurately in a scene. A real 3D scene can be reconstructed from many 2D views of a scene imaged from different perspectives. Such a representation would not only allow the perception of depth in a scene but look-around capability as well. However, due to practical display constraints, although only a less complex solution employing two views of a scene is easily realizable, it can still impart the sensation of depth. These two views are required to be imaged under specific constraints and specifically, one view is intended for each respective eye of a human visual system so that a human brain can generate the depth information necessary to perceive realism. The two views together represent stereoscopic video where each view is similar to normal video except for the fact that the two views are related under the constraints imposed by stereoscopic vision. Under specified constraints, the two views imaging a scene differ by what is known as disparity between the views, which is typically only a few pixels in vertical direction, but can be of the order of about 40 pixels or higher in the horizontal direction, assuming each view is imaged at normal TV resolution.
Stereoscopic video has potential applications in education, training, 3D movies/entertainment, medical surgery, videoconferencing, virtual travel and shopping, multimedia presentations, video games and immersive virtual reality experiences, and others. Although, the potential applications of 3D/stereoscopic video are many, there are several challenges to be overcome before its potential can be truly harnessed and its use becomes wide spread. Currently the two primary challenges are: a convenient stereoscopic/3D display, and, a highly efficient compatible coding scheme. Regarding the issue of displays, most practical means of displaying stereoscopic/3D video requires viewers to wear specialized viewing glasses. These viewing glasses may be active shuttered glasses, which contain electronics, or passive polarizing glasses, which are somewhat less cumbersome. Although some displays not requiring specialized viewing glasses (autostereoscopic systems) are available, they impose other restrictions, e.g., viewing zones and discreteness of views and may typically require between 10 and 20 views for realism. Stereoscopic video, on the other hand, although it requires use of specialized glasses, can impart perception of depth in a scene and uses only two views, left-view intended for the left-eye and right-view intended for right-eye of a human visual system in either time-sequential (with active synchronized shuttered glasses) or time-simultaneous (with passive polarizing glasses). Besides the display issue, the other main issue is that of efficient digital compression of 3D/stereoscopic video so that the multiple views can be easily manipulated, stored or transmitted as needed. Towards that end, interworking with existing or emerging standards based coding schemes as well as existing displays for normal video is a necessity in many applications. Of particular relevance is the second phase of ISO Moving Pictures Experts Group (MPEG-2) video coding standard, which offers a good solution to a large variety of applications requiring digital video including broadcast TV via satellite, cable TV, HDTV, digital VCRs, multipoint video and others.
As is well known, techniques based on or extending from basic MPEG-2 video coding increases coding efficiency. Basic video coding in MPEG-2 involves motion-compensated DCT coding of frame- or field-pictures and is dealt with in detail in A. Puri, "Video Coding Using the MPEG-2 Compression Standard," Proceedings of SPIE Visual Communications and Image Processing, Boston, Mass., November 1993, pp. 1701-1713, and in R. L. Schmidt, A. Puri and B. G. Haskell, "Performance Evaluation of Nonscalable MPEG-2 Video Coding," Proceedings of SPIE Visual Communications and Image Processing, Chicago, Ill., September 1994, pp. 296-310 the contents and disclosure of both references which are expressly incorporated by reference herein.
In the past, several attempts have been made to reduce the bandwidth of digital stereoscopic video. Among the more promising methods presented recently are the methods which are based on or are extensions of MPEG-2 video coding. These methods typically employ either compensation of disparity on block by block basis between the two views or use both motion and disparity compensation, also on a block by block basis. However, it has been determined that for some stereoscopic video scenes, disparity compensation does not work very well, attributable to significant global differences in brightness and in color between the two views of a stereoscopic scene.
It would therefore be highly desirable to correct for differences in brightness and in color between the two views of a stereoscopic scene and perform the corrections globally for the sake of compatibility with MPEG-2 based coding of stereoscopic video.
It would also be highly desirable to correct for global mismatch in gain and offset between the left- and the right-views of stereoscopic video due to differences in imaging cameras, the differences in imaging sensors of cameras, differences in brightness and color balance adjustments of the sensors etc. Moreover, separate mismatch correction for global gain and offset for each of the three component signals, luminance, Y, and color signals, Cr and Cb, would be highly desirable.