Reconstructing 3D scenes from 2D images is a very important problem in computer vision and other imaging applications. Conventional 3D reconstruction methods typically use two or more images to obtain depths in the scene. However, depth recovery from a single 2D imago is a severely ill-posed problem. Rather than reconstructing a 3D scene using geometric entities, such as points and polygons, one method uses a 3D reconstruction procedure that constructs a popup model. Using several image and geometric features, that method automatically classifies regions as ground, buildings and sky. Another method infers absolute depth using image features and weak assumptions based on coplanarity and connectivity constraints.
For modeling indoor scenes, one method uses a cuboid model to approximate geometry of a room. With that model, pixels in an image are classified as left wall, middle wall, right wall, floor and ceiling. For indoor scenes, we refer to this classification as the indoor scene layout estimation or just layout estimation. To estimate the optimal layout, hundreds of cuboids are sampled and each cuboid is assigned a score based on several image and geometric features. We refer to this cuboid estimation problem as layout estimation. That method uses training images to classify texture, color and line features to obtain the pixel-level classification.