As a method of converting a two-dimensional video into a three-dimensional video, methods disclosed in JP-A-9-107562 and JP-A-10-51812 have been known.
The outline of the method of converting a two-dimensional video into a three-dimensional video, which is disclosed in JP-A-9-107562, will be first described on the basis of FIG. 1.
In a two-dimensional video (2D video), a state where a bird is flying from the left to the right in front of a mountain shall be picked up, as shown in images 1 to 5.
A motion vector between images, for example, a motion vector in the case of transition from the image 1 to the image 2 or a motion vector for transition from the image 2 to the image 3, is extracted for each of a plurality of motion vector detection areas set in a screen. A subject (bird) area and a background (mountain) area are then determined from the extracted motion vector. A reference image is determined to be one of a right eye image and a left eye image, and an image which is delayed by several fields corresponding to the magnitude of the motion vector is determined to be the other eye image such that a subject is located ahead of a background.
When it is assumed that the current image which is the reference image is the image 4, and an image (a delayed image) which is delayed by a predetermined number of fields depending on the magnitude of a motion vector obtained from the image 3 and the image 4 is the image 2, the reference image (the image 4) and the delayed image (the image 2) are respectively presented as a left eye image and a right eye image in the direction of the motion vector.
The operations are repeatedly performed, thereby displaying a video having a stereoscopic effect, that is, a three-dimensional video. This method shall be referred to as the MTD method.
The concept of the method of converting a two-dimensional video into a three-dimensional video, which is disclosed in JP-A-10-51812, will be described.
First, a two-dimensional image is divided into a plurality of areas, and image features such as a chrominance component, a high-frequency component, and a contrast are extracted for each of the areas obtained by the division. The areas obtained by the division are then grouped by areas to which the same object belongs on the basis of the chrominance component. A depth is estimated for the areas obtained by the grouping depending on information related to the average contrast and the average high-frequency component in the areas, to calculate a parallax. A left eye image and a right eye image are horizontally shifted in the opposite directions for the areas obtained by the grouping on the basis of the calculated parallax, to produce a three-dimensional video.
The left eye video and the right eye video, which are thus produced, are stereoscopically displayed on stereoscopic display means. This method shall be referred to as the CID method.
The MTD method and the CID method will be described in more detail.
1. MTD Method
In the MTD method, a video entering either one of the right and left eyes is delayed depending on the movement thereof in a screen, to produce a stereoscopic effect. In this case, a field delay (to be a target) (a target delay dly_target) most suitable for the video is determined by the following equation (1) using an average of horizontal vectors in a subject area obj_xvec [pixel/field] and a horizontal vector in a background area bg_xvec [pixel/field] which are obtained by subject/background judgment. The vector takes a positive value with respect to rightward movement.dly_target=Mdly_sisa/(obj_xvec−bg_xvec) [field]  (1)
Here, Mdly_sisa indicates a parallax [pixel] for determining a stereoscopic effect produced by the MTD method, and its value is previously set through a user interface or the like.
The direction of delay showing which of the videos entering the right and left eyes should be delayed is determined by the following equation (2) using the target delay dly_target:dly_target>0 . . . delay of right eyedly_target<0 . . . delay of left eyedly_target=0 . . . no delay  (2)
Although the delay was described, taking the target delay as an example for convenience, the number of fields by which the video is delayed and the direction of delay are determined by a real delay obtained by smoothing the target delay on a time basis.
2. Subject Position Control
Subject position control is employed in order to correct ambiguity, concerning the position where an object is presented relative to a screen, created when the MTD method is carried out. That is, in the MTD method, how a video is seen differs depending on which of a subject and a background moves, as shown in FIG. 2. In the subject position control, when the subject moves, the overall screen is moved backward by shifting the position where a right eye video is presented to the right and shifting the position where a left eye video is presented to the left so that the number of pixels from the subject to the screen is equal to the number of pixels from the screen to the background. On the other hand, when the background moves, the overall screen is moved forward by shifting the position where a right eye video is presented to the left and shifting the position where a left eye video is presented to the right so that the number of pixels from the subject to the screen is equal to the number of pixels from the screen to the background.
A horizontal phase t_phr of the right eye and a horizontal phase t_phl of the left eye, which are calculated by the subject position control, can be expressed by the following equation (4) when a phase obj_sisa of the subject and a phase bg_sisa of the background, which are produced by a field delay, are expressed by the following equation (3):obj_sisa=obj_xvec*delay [pixel]bg_sisa=bg_xvec*delay [pixel]  (3)t_phr=(obj_sisa+bg_sisa)/2 [pixel]t_phl=−t_phr [pixel]  (4)
Since the real delay is obtained by smoothing the target delay dly_target on a time basis, the absolute value of a parallax dly_sisa (=obj_sisa−bg_sisa) [pixel] produced by the MTD method (dly_sisa takes a positive value when the subject is projected, while taking a negative value when it is recessed) does not completely coincide with Mdly_sisa [pixel] previously determined by user setting. When there is no delay (dly_target=0), dly_sisa=0.
3. CID Method
The CID method is a method of dividing one screen into a plurality of areas, estimating a depth for each of the areas from image information obtained from the area and a composition, and shifting each of pixels in the screen on the basis of the estimated depth, to produce a binocular parallax.
The applicant of the present invention has also developed a CID method which is a further improvement of the CID method already developed.
FIG. 3 shows the procedure for control in the CID method after the improvement (which is not known).
First, one screen is divided into a plurality of areas, and information related to a high frequency, a contrast of luminance, and a chrominance (B-Y, R-Y) component are obtained from each of the areas (step 1). A depth estimate for each of the areas, which has been estimated from the information and the composition, is found (step 2). When the found depth estimate is merely converted into a shift amount, a distortion is noticeable in a conversion image, thereby performing distortion suppression processing (step 3). The depth estimate after the distortion suppression processing is subjected to distance scale conversion (step 4).
The distortion suppression processing will be described. In the CID method, a 2D image is deformed, to produce left and right images. When the deformation is too large, an unnatural video is obtained. In the CID method, therefore, control is carried out such that the difference in phase between the adjacent areas is not more than a distortion allowable range h_supp_lev [Pixel] of a conversion image which is previously determined by a user. That is, the difference in phase between the adjacent areas is found from phases for the areas which are found by assigning the estimated depth to the distance between Mfront and Mrear. The maximum value of the difference is taken as h_dv_max [pixel]. When h_dv_max exceeds the distortion allowable range h_supp_lev [pixel], Mfront and Mrear are reduced in the direction nearer to 0 [pixel] until the following equation (5) is satisfied:h_dv_max≦h_supp_lev  (5)
When h_dv_max is larger than h_supp_lev, therefore, a projection phase front [Pixel] and a recession phase rear [Pixel] of the conversion image are made smaller than the maximum projection phase Mfront [Pixel] and the maximum recession phase Mrear [Pixel] which are previously determined by the user by a linear operation expressed by the following equation (6), as illustrated in a diagram on the right side of FIG. 4.front=Mfront*h_supp_lev/h_dv_max for h_dv_max>h_supp_levrear=Mrear*h_supp_lev/h_dv_max for h_dv_max>h_supp_lev  (6)
Conversely, when h_dv_max is smaller than h_supp_lev, the distortion of the conversion image is within the allowable range. Accordingly, the following equation (7) holds, as illustrated in a drawing on the left side of FIG. 4:front=Mfront for h_dv_max≦h_supp_levrear=Mrear for h_dv_max≦h_supp_lev  (7)
That is, when h_dv_max is smaller than h_supp_lev, a dynamic range dv_range (=front–rear) in the phase of the conversion video is equal to a dynamic range Mdv_range (=Mfront–Mrear) in the phase previously determined by the user.
In the distortion suppression processing for suppressing the dynamic range in a real machine, h_supp_lev is replaced with a unit of an estimated depth in order to reduce a load on a CPU. For convenience, however, description was made using a unit system of pixels.
Description is made of a distance scale conversion method.
In a two-lens stereoscopic display, a parallax W between corresponding points of a right eye image (an R image) and a left eye image (an L image) and a distance Yp from a screen actually viewed to a position where the images are merged together are in a non-linear relationship.
That is, when the R image and the L image which have a parallax W [mm] therebetween on the screen of the display are viewed from a position spaced a distance K [mm] apart from the screen, the distance Yp [mm] from the screen to the position where the images are merged together is expressed by the following equation (8):Yp=KW/(W−2E)  (8)
In the foregoing equation (8), variables respectively represent the following values:
K: a distance [mm] from the screen of the display to a viewer
E: a length [mm] which is one-half the distance between the eyes
W: a parallax [mm] between the corresponding points of the left eye image and the right eye image on the screen of the display
Yp: a distance [mm] from the screen to the position where the images are merged together
When the foregoing equation (8) is shown graphically in FIG. 5, letting K=1000 mm and 2E=65 mm.
FIG. 5 shows that a spatial distortion cannot be prevented from occurring in images to be merged together only by linearly replacing a depth estimate with a unit of pixels. In a distance scale method, therefore, the depth estimate is converted into the unit of pixels in consideration of the spatial distortion.
The distance scale conversion method will be briefly described.
The width of one pixel on the display is taken as U [mm]. When it is assumed that there is a parallax W corresponding to α pixels between the corresponding points, the parallax W is expressed by the following equation (9):W=αU  (9)
By substituting the foregoing equation (9) in the foregoing equation (8), the relationship between the pixels and the position where the images are merged together is found, as expressed by the following equation (10):Yp=KαU/(αU−2E)  (10)
Furthermore, the foregoing equation (10) is deformed, to obtain the following equation (11):α=2E*Yp/{(Yp−K)U}  (11)
In complete distance scale conversion, when the maximum projection amount Ymax′ from the screen and the maximum recession amount Ymin′ from the screen are designated, if a depth estimate depth (having a value from 0 to 100) is determined, a corresponding depth Yp can be obtained by simple scale conversion expressed by the following equation (12):Yp=Ymax′−Ymin′)×depth/100  (12)
A parallax a corresponding to Yp is found by the foregoing equation (11). Consequently, the depth estimate can be converted into a unit of pixels in consideration of the spatial distortion.
In the complete distance scale conversion, when a 256-stage parallax conversion table W″ is used, the space between Ymax′ and Ymin′ is first divided into 256 equal divisions, and a corresponding parallax conversion table W″ [pixel] is found for each depth Yp on the basis of the foregoing equation (11).
In this case, W″ [255] is a parallax corresponding to Ymax′, and W″ [0] is a parallax corresponding to Ymin′. If the depth estimate depth is determined, a corresponding parallax α is found from the following equation (13):α=W″[lev]  (13)
Here, lev indicates the number of stages on the parallax conversion table, and is expressed by the following equation (14):lev=255×depth/100  (14)
Although description was made of the complete distance scale conversion method in the 2D/3D conversion, the method has two problems, described below:
(1) When the maximum projection amount Ymax′ is increased until the depth Yp is saturated, the distortion of the conversion image itself (the distortions of the R image itself and the L image itself) is increased in a portion having a depth in the vicinity of Ymax′.
(2) When an attempt to enlarge a dynamic range in a depth reproduction space is made, there is no alternative but to reduce the maximum recession amount Ymin′. Accordingly, an area projected forward from the screen is extremely reduced.
In order to avoid the above-mentioned problem, the conversion must be carried out using only an area where there is some degree of proportionality between a depth and a parallax. However, this causes the complete distortion scale conversion to be approximately the same as pixel scale conversion. Therefore, it is no longer easy to say that the complete distance scale conversion is useful because complicated processing is performed.
Therefore, polygonal line distance scale conversion next introduced has been devised. In the polygonal line distance scale conversion, a projection amount ratio C [%] is introduced, to divide the space from Ymax′ to 0 into 255*C/100 into equal divisions, and to divide the space from 0 to Ymin′ into 255{(1−C)/100}} into equal divisions, thereby finding a parallax conversion table, as shown in FIG. 7.
That is, the projection amount ratio C is controlled, thereby making it possible to change a projection amount forward from the screen and suppress the distortion of the conversion image itself in a portion where the projection amount reaches its maximum. In the polygonal line distance scale conversion, an equation corresponding to the foregoing equation (12) is the following equation (15):Yp=Ymax′×{depth−(100−C)}/C for depth≧(100−C)Yp={−Ymin′×depth/(100−C)}+Ymin′ for depth<C  (15)
Furthermore, an equation corresponding to the foregoing equation (14) representing the number of stages on the parallax conversion table W″ is the following equation (16):lev=(255−Dlev)×{depth−(100−C)}/C+Dlev for depth≧(100−C)lev=Dlev×depth/(100−C) for depth<(100−C)  (16)Here, Dlev is defined by the following equation (17), and represents the number of stages, on the parallax conversion table, corresponding to the screen:Dlev=(100−C)×255/100  (17)
The polygonal line distance scale conversion is so carried out that no spatial distortion occurs ahead of and behind the screen. Conversely speaking, a spatial distortion occurs on the screen. This is based on the hypothesis that the spatial distortion is most difficult to understand in the vicinity of the screen from the term “when a stereoscopic video is viewed, how the video is seen differs ahead of and behind a screen” obtained from a lot of viewers.
As values actually employed, Ymax′, Ymin′, and C are determined such that the inclination (the step width) of the depth parallax conversion table does not greatly differs ahead of and behind the screen.
Meanwhile, the above-mentioned distortion suppression processing using the linear operation is effective for the pixel scale conversion. However, it cannot be said that it is effective for the distance scale conversion. The reason for this is that the distance scale conversion has such properties that the parallax greatly differs ahead of and behind the screen even if the depth estimate is the same, for example “1” because the depth Yp and the parallax W [pixel] are non-linear. This tendency becomes significant in a large-screen display. In the polygonal line distance scale which is an improvement of the complete distance scale, the projection amount ratio C is introduced even in the sense of lessening the properties.
Even in the polygonal line distance scale capable of controlling the projection amount ratio C, however, the maximum value h_dv_max [pixel] of the phase difference between the adjacent areas cannot be completely suppressed within the distortion allowable range h_supp_lev [pixel] (the principle of suppressing a distortion in a pixel scale cannot be faithfully realized). In order to realize the principle of suppressing a distortion, distortion suppression processing must be performed after the distance scale conversion.
4. Simultaneous Use of MTD Method and CID Method
Generally, a human being perceives a feeling of distance at the time of stereoscopic view, for example, by the difference between dead angle portions (occlusion) of images respectively entering his or her right and left eyes, for example, caused by the difference between the positions of the right and left eyes. In terms of this, the feeling of distance or the like can be covered in the MTD method. On the other hand, a video which does not move or a video whose movement is complicated cannot be satisfactorily converted into a three-dimensional video. In the CID method, a parallax between right and left eye images can be freely changed. On the other hand, a human being cannot be shown a video as if its dead angle portions serving as a shadow of a subject were different depending on the parallax in his or her right and left eyes.
Therefore, it is considered that 2D/3D conversion is carried out simultaneously using the MTD method effective for a moving picture and the CID method capable of also converting a still picture. In this case, it is considered that a parallax obtained by the MTD method and a parallax obtained by the CID method are simply added together.
However, the parallax obtained by the MTD method and the parallax obtained by the CID method are individually controlled. Accordingly, the parallax produced by the conversion greatly depends on the presence or absence of the movement of an input video. That is, when the input video is a moving picture, a parallax obtained by the MTD method and a parallax obtained by the CID method are reflected on a conversion video. When it is a still video, however, there is no parallax obtained by the MTD method, and there is only a parallax obtained by the CID method.
Such a phenomenon that a stereoscopic effect of a conversion video greatly differs depending on an input video is inconvenient when a user adjusts a stereoscopic effect.
An object of the present invention is to provide a method of converting a two-dimensional video into a three-dimensional video, in which a stereoscopic effect of a conversion video can be prevented from greatly differing depending on an input video when the two-dimensional video is converted into the three-dimensional video simultaneously using the MTD method and the CID method.
Another object of the present invention is to provide a method of converting a two-dimensional video into a three-dimensional video, in which the distortion of a conversion image can be suppressed when a depth estimate is converted into a parallax using distance scale conversion.