The disclosed embodiments of the present invention relate to depth map generation, and more particularly, to a method and an apparatus of determining a perspective model by utilizing region-based analysis and/or temporal smoothing.
Since the success of the three-dimensional (3D) movie such as Avatar, the 3D playback has enjoyed growing popularity. Almost all the television (TV) manufacturers put 3D functionality into their high-end TV products. One of the important required 3D techniques is 2D-to-3D conversion, which converts the traditional two-dimensional (2D) videos into 3D ones. It is important because most contents are still in the traditional 2D format. For a 2D monocular video input, objects and their geometry perspective information are estimated and modeled, and then a depth map is generated. With the produced depth map, depth image based rendering (DIBR) may be used to convert the original 2D monocular video into stereoscopic videos for the left and right eyes, respectively. In such a conventional processing flow, the most important issue is how to generate the depth map.
In order to correctly generate the depth map of the input 2D video, various cues are applied to estimate the depth information. Many conventional depth map generation methods are proposed to retrieve the depth map using different combinations of those depth cues. The perspective information is considered to generate an initial perspective/global depth map. Most of the conventional methods need an initial perspective/global depth map which represents the perspective view of a scene. However, the conventional initial depth map often provides only bottom-top perspective, which does not always represent the perspective of the environment, so vanishing line or feature point is required to model a more complex perspective of the environment. One conventional algorithm may carry out the vanishing line/feature point detection by Hough transform. However, it requires a time demanding and computational intensive full frame pixel operation. Regarding other conventional algorithms, they often produce instable vanishing line/feature point which jumps between frames, resulting in judder perceived on the created depth map.