One general problem in computer vision is how to determine the characteristics of a scene from images representing the underlying scene. Following are some specific problems. For motion estimation, the input is usually a temporally ordered sequence of images, e.g., a "video." The problem is how to estimate the projected velocities of various things--people, cars, balls, background moving in the video. Another problem deals with recovering real-world three-dimensional (3D) structure from a 2D image. For example, how to recover the shape of an object from a line drawing, a photograph, or a stereo pair of photographs. Yet another problem is how to recover high-resolution scene details from a low-resolution image.
Humans make these types of estimates all the time, frequently sub-consciously. There are many applications for machines to be able to do this also. These problems have been studied by many workers with different approaches and varying success for many years. The problem with most known approaches is that they lack machine learning methods that can exploit the power of modern processors within a general framework.
In the prior art, methods have been developed for interpreting blocks world images. Other prior art work, using hand-labeled scenes, has analyzed local features of aerial images based on vector codes, and has developed rules to propagate scene interpretations. However, these solutions are for specific one-step classifications, and therefore, cannot be used for solving a general class of low-level vision problems. Methods to propagate probabilities have been used, but these methods have not been put in a general framework for solving vision problems.
Alternatively, optical flow can be estimated from images by using a quad-tree to propagate motion information across scale. There, a brightness constancy assumption is used, and beliefs about the velocity of the optical flow is presented as a gaussian probability distribution.