A fundamental problem in computer vision is single view reconstruction (SVR). SVR deals with the problem of recovering a 3D real world scene from a single 2D image.
In a 2D image of an indoor or outdoor scene including man-made structures, e.g., buildings and rooms, the predominant features are 3D lines in three orthogonal directions. It is relatively easy to determine when the lines intersect in the image. However, this does not necessarily mean that the corresponding lines intersect in the 3D real world.
A trivial counterexample are lines that share a common vanishing point in the image. Those lines appear to intersect at the vanishing point, but none intersect in the real world, where the lines are parallel to each other. Thus, identifying when apparent intersections in images correspond to real world intersections is difficult. There are several challenges to infer the 3D structure of lines.
The biggest challenge is with occluding edges in the image that produce false intersections. Line detection methods in real images often, miss important lines and produce spurious lines. Detected lines are often broken or cropped to obliterate any evidence of intersections. In addition, real world scenes are particularly challenging due to clutter.
SVR is a distinctly unsolved problem in computer vision. The reconstruction can be geometric or semantic. The most common geometric reconstruction method is based on labeling lines as convex, concave or occluding lines. The line labeling problem is in general NP-hard. Several challenging line drawings have been studied and novel constraint satisfaction methods have been developed to solve the SVR problem. Those methods primarily operate on synthetic or computer generated line drawings, and are generally unsatisfactory for real world images. Most other geometrical single-view reconstruction methods that give good results for real images rely on some kind of user interaction.
There is a renewed interest in the SVR problem as more holistic approaches become available. For example, pixels in the image can be classified, as sky, buildings and ground. That classification, along with an estimation of surface orientations, can produce 3D models that are sufficient for several applications such as synthesizing, walkthroughs, stereoscopic content generation for movies, and 3D context for object detection and recognition. The methods used for such coarse modeling uses several geometrical and image features.
Along with several image features and weak assumptions on coplanarity and colinearity, one method estimates depth from a single image. Another method approximates a room geometry using a cuboid and samples different hypotheses and selects the best one based on several image and geometrical features. Clutter in indoor scenes has been modeled as cuboids and reconstructed in 3D.
Being a severely ill-posed problem, SVR has led to several solutions, such as the computation of orientation maps, inferring geometry from human activities, explicit use of boundary information, template 3D shapes and even physics-driven stability and mechanical constraints.
Performance can be significantly improved by using optimization strategies for exactly inferring layouts from a larger solution space. Constraints based on Manhattan assumptions have been used for modeling buildings from aerial photos.