Images are the projection of the three-dimensional (3-D) world to two dimensions. In this regard, the two-dimensional (2-D) images are generally not readily useful in generating or determining 3-D image features which makes inferring 3-D structure from an image difficult. An image might represent an infinite number of 3-D models. However, not all the possible 3-D structures that an image might represent are valid, and only a few are likely.
When viewing a typical three-dimensional 3-D image such as a photograph, a human can interpret 3-D structure represented by the image without significant loss of intended perspective. Generally, the environment that we live in is reasonably structured, and hence allows humans to infer 3-D structure based on prior experience. Humans use various monocular cues to infer the 3-D structure of the scene. Some of the cues are local properties of the image, such as texture variations and gradients, color, haze and defocus, yet local image cues alone are usually insufficient to infer the 3-D structure. Humans thus “integrate information” over space to understand the relation between different parts of an image, which is important to the human understanding of 3-D structure. Both the relation of monocular cues to 3-D structure, as well as relation between various parts of an image, is learned from prior experience. For example, humans remember that a structure of a particular shape is a building, sky is blue, grass is green, trees grow above the ground and have leaves on top of them, and so on.
For many computer vision systems, however, interpreting 3-D structure represented by a 2-D image is extremely challenging largely due to loss in depth perspectives. Ambiguities result, for example, relative to loss of details in local image structures relative to details in other structures and to general distortion of the 3-D structure. For computer vision systems, there are intrinsic ambiguities between the local image features and the 3-D location of the points used to depict projection of the depth.
Such issues have presented challenges to providing accurate image-based information as well as to providing accurate interpretation of image-based information.