Many potential applications (e.g., robotics, gaming environments, etc.) may wish to utilize automated visual capture and/or analysis in order to evaluate virtual and/or physical three-dimensional (3D) environments in various ways. Such applications may be limited by sensing equipment (e.g., a robot may have only a two-dimensional (2D) camera available), processing power, and/or other factors.
Existing algorithms for automated visual evaluation do not make use of combined information in 3D scenes and images at the same time. Some existing solutions use multiple cameras to construct a three dimensional representation of a scene in order to measure 3D features by virtue of multiple images. Other existing solutions use 2D images and associated 3D measurements (e.g., of a face) in order to create a model of a 3D feature (e.g., the face). Some existing systems utilize surfaces of an object for identification (e.g., facial recognition). Some existing algorithms estimate a shape from some other feature (e.g., motion or shading). In addition, some existing algorithms provide hierarchical feature selection. Some existing algorithms also utilize temporal slowness of features in an attempt to learn higher order visual features without labeled data.
As can be seen, there is a need for a general purpose way to evaluate sets of visual features by exploiting the relationship between images and scenes which can be applied to a variety of visual evaluation tasks.