1. Field of the Invention
The present invention relates generally to the image processing technology, and more particularly, to a system for converting the 2D video photographed by the conventional single-lens reflex (SLR) camera into the 3D video, which can be played by the 3D display device.
2. Description of the Related Art
One of the primary core technologies for converting the 2D video into the 3D video is depth estimation. In the Int'l Pub. No. WO 2008/080156, techniques for complexity-adaptive and automatic 2D to 3D image and video conversion were proposed. In the techniques, each frame of a 2D video is classified and partitioned into flat and non-flat regions. The flat regions (e.g., the background regions) are directly converted by simple depth estimation methods, while the non-flat regions (e.g., the foreground regions) are processed by complex methods. Thereafter, each frame is converted based on methods of adaptive complexity.
In a paper entitled “A block-based 2D-to-3D conversion System with bilateral filter” (published at pp. 1-2 of Int'l. Conf. on Consumer Electronics 2009 (ICCE '09) and co-authored by Chao-Chung Cheng, Chung-Te Li, Po-Sen Huang, Tsung-Kai Lin, Yi-Min Tsai, Liang-Gee Chen), it proposed two depth estimate modules—depth from motion (DOF) and depth from geometrical perspective (DOGP). The DOF module employs motion vector estimate for figuring out the motion of each block which can function as the depth estimation of the block. The DOGP module employs several kinds of user-defined background depth models and selects one proper background model by classification.
As indicated above, each of the aforesaid technologies of converting 2D into 3D is to carry out individual depth estimate based on the image information at one single time point, so the depths between adjacent time instances might be discontinuous to result in wrong perception of depth in the playback process of the 3D video. The above effect may be harmful to viewer's eyes after a long time. Besides, the aforesaid prior art did not apply any analysis to the video-capturing styles (e.g., static or moving camera) of the video content in need of conversion, so it is not applicable to pluralistic household or commercial video. The general commercial film may contain caption and the existing 3D video conversion technology does not process the caption, so it is subject to difficulty or uncomfortableness when the viewer reads the caption in a 3D stereo manner.