The present invention relates to coding of stereoscopic digital video signals to improve image quality. In particular, a method and apparatus for optimizing the disparity estimation between the right and left view pixel luminance values is presented.
Recently, stereoscopic video transmission formats have been proposed, such as the Motion Picture Experts Group (MPEG) MPEG-2 Multi-view Profile (MVP) system, described in document ISO/IEC JTC1/SC29/WG11 N1088, entitled "Proposed Draft Amendment No. 3 to 13818-2 (Multi-view Profile)," November 1995, incorporated hereinby reference. Stereoscopic video provides slightly offset views of the same image to produce a combined image with greater depth of field, thereby creating a three-dimensional (3-D) effect. In such a system, dual cameras may be positioned about two inches apart to record an event on two separate video signals. The spacing of the cameras approximates the distance between left and right human eyes. Moreover, with some stereoscopic video camcorders, the two lenses are built into one camcorder head and therefore move in synchronism, for example, when panning across an image. The two video signals can be transmitted and recombined at a receiver to produce an image with a depth of field that corresponds to normal human vision.
The MPEG MVP system includes two video layers which are transmitted in a multiplexed signal. First, a base layer represents a left view of a three dimensional object. Second, an enhancement (e.g., auxiliary) layer represents a right view of the object. Since the right and left views are of the same object and are offset only slightly relative to each other, there will usually be a large degree of correlation between the video images of the base and enhancement layers. This correlation can be used to compress the enhancement layer data relative to the base layer, thereby reducing the amount of data that needs to be transmitted in the enhancement layer to maintain a given image quality.
The MPEG MVP system includes three types of video pictures; specifically, the intra-coded picture (I-picture), predictive-coded picture (P-picture), and bi-directionally predictive-coded picture (B-picture). An I-picture completely describes a single video picture without reference to any other picture. In the base layer, P pictures are predicted based on previous I or P pictures. B-pictures are predicted from the closest earlier I or P picture and the closest later I or P picture. The base layer can be coded according to the MPEG-2 standard, details of which can be found in document ISO/IEC JTC1/SC29/WG11 N0702, entitled "Information Technology--Generic Coding of Moving Pictures and Associated Audio, Recommendation H.262," Mar. 25, 1994, incorporated herein by reference.
In the enhancement layer, a P-picture can be predicted from the most recently decoded picture in the enhancement layer, regardless of picture type, or from the most recent base layer picture, regardless of type, in display order. Moreover, with a B-picture in the enhancement layer, the forward reference picture is the most recently decoded picture in the enhancement layer, and the backward reference picture is the most recent picture in the base layer, in display order. Pictures in the enhancement layer can be predicted from pictures in the base layer in a cross-layer prediction process known as disparity prediction. Prediction from one frame to another within a layer is known as temporal prediction.
However, with disparity prediction of enhancement layer frames, an error is often introduced due to an imbalance between the luminance values of pixels in the base and enhancement layers. This imbalance can be caused by variations in performance between the base and enhancement layer cameras, and makes the process of disparity estimation and prediction more difficult. Furthermore, the imbalance may be caused by scene dissolves or significant changes in brightness and/or contrast in a scene such as strong flashed lights. As a result of this cross-channel luminance imbalance, image quality may be noticeably degraded.
Some schemes have been developed which reduce the effects of the cross-channel luminance imbalance. For example, R. Franich et al., in the document ISO/IEC JTC1/SC29/WG11 MPEG 96, entitled "Balance Compensation for Stereoscopic Image Sequence Sequences," March 1996, Firenze, discusses a linear transformation for adjusting the right view image sequence to get the same luminance mean and variance as the left view channel. A. Puri et al., in the document ISO/IEC JTC1/SC29/WG11 MPEG 95/0487, entitled "Gain Corrected Stereoscopic Coding Using SBASIC for MPEG-4 Multiple Concurrent Streams," November 1995, Dallas, discusses correcting the right view with a gain and offset value. However, such schemes do not minimize the least-square-error of the luminance imbalance.
Accordingly, it would be advantageous to provide a disparity estimation scheme for a stereoscopic video system such as the MPEG MVP system which minimizes the effects of cross-channel luminance imbalances due to camera variations and scenes with significant changes in brightness or contrast. Moreover, the scheme should be implemented either globally, at the picture level, or locally, at the macroblock level. Furthermore, the scheme should be compatible with efficient prediction coding of video sequences of MPEG-2 and similar coding protocols. The present invention provides the above and other advantages.