K. Anjyo, and K. Arai, “Tour Into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image”, SIGGRAPH' 97 Proceedings, pp. 225-232 (1997)) describes a technique wherein a user can remove foreground objects from a landscape photographic image, specify a vanishing point in the perspective and, using the specified vanishing point, estimate the general configuration of a scene for carrying out viewpoint movement.
Many techniques have been attempted to estimate the three-dimensional structure of a scene from a two-dimensional image. Most of these techniques follow the same general approach: estimating an image's perspective using calculated vanishing points, extracting textures of relevant objects in the scene and pasting them into a 3D model, while taking into account the perspective.
Some efforts have tried to minimize the calculations involved, such as the “Automatic photo pop-up” technique proposed by Derek Hoiem in 2002. The Hoiem technique is based on a pop-up book for children, in which a picture pops up when the book is opened. According to the technique, a 2D photographic image is divided into three parts: a ground area, a vertical area, and a sky area. The boundaries between the ground area and the vertical area in the image are estimated. Once the boundaries have been estimated as references, objects forming the 2D photographic image are cut and folded, thereby generating a 3D model.
Other efforts aim to improve the well-established methods. For instance, in U.S. Pat. No. 8,254,667, titled “Method, Medium And System Implementing 3D Model Generation Based On 2D Photographic Images”, issued Aug. 28, 2012, a modeling system is implemented for object identification for object extraction.
One problem with many conventional 3D image creation methods is that it can be difficult to determine a vanishing point automatically, because estimating a structure's perspective is not always possible for all possible scenes. Furthermore, even when estimating a structure's perspective is feasible, it can be difficult to automatically compose a correct depth structure model for making the image naturally viewable as a 3D object.
Another issue is that many 3D images created using the above-described techniques deliver a poor user experience for the consumer market because the 3D images are not full images, but rather a 3D model with “holes” because of the textures that have been extracted. Such holes can detract from the experience by making the limitations of the 3D environment obvious.