A method for converting a two-dimensional image (2D image) to a three-dimensional image (3D image) is known in which depth information (hereinafter, referred to as a depth map) in the 2D image is generated, and using the generated depth map, a stereoscopic pair corresponding to the 2D image (a left eye image and a right eye image) is synthesized.
For example, PTL 1 discloses a method for producing a 3D image on the basis of a 2D image including disparity information. Specifically, in PTL 1, a 2D image is input, an image of a human face is extracted from the input 2D image to obtain a face image, and the disparity information is given to the obtained face image to produce a 3D image. The produced 3D image is output and displayed on a mobile device screen or the like.
PTL 2 discloses a method for generating a depth map from a basic structure model and a non-3D image, for example. Specifically, in the method disclosed in PTL 2, first, a statistical amount of the high frequency component or activity of the luminance signal in a predetermined region in the non-3D image (2D image) of a scene is calculated to estimate a depth structure of the scene. Next, based on the calculated statistical value and the composition ratio in each region, three basic depth map models corresponding to the non-3D image are generated. Finally, the R signal (R component in the RGB color space) in the non-3D image is superimposed on the generated basic depth model to generate a depth map corresponding to the non-3D image. Thus, in PTL 2, the depth information is estimated from the non-3D image.
PTL 3 discloses a method for converting a 2D image to a 3D image using a depth map generated on the basis of a sample image, for example. In the method disclosed in PTL 3, first, a background image of an input 2D image is subjected to matching using a database in which a sample image including depth information is stored as a background image. Next, based on the matched background image, a foreground image is extracted. Moreover, the foreground image of the 2D image is detected using a color segmentation method or a comparison technique using graph-based segmentation. Thus, the foreground image and the background image are extracted to generate a relative depth map of the foreground image and the background image. Thus, in PTL 3, the depth information is generated based on the sample image.
A Shape From Motion (SFM) method is also known as a method for generating a depth map using motion information to obtain motion disparity. The SFM method estimates the motion of an object across a video frame and a model according to the disparity of an moving object, the object being an object in a three-dimensional space (3D space). The SFM method estimates the motion of the object supposing that an object closer to a camera has disparity larger than that of an object far away from the camera,