In many computer applications it is necessary to determine poses of sensors relative to sensed features in the environment in which the sensors are located. The pose of a sensor gives both the location and the orientation of the sensor. In three-dimensions, the location is usually specified by a three-dimensional Cartesian coordinate system (x,y,z), and the orientation is specified by a three-dimensional polar coordinate system (u,v,w).
For example, many computer vision and graphics applications require that the poses of cameras are known. These applications include three-dimensional reconstructions, modeling by structure-from-motion, photo-realistic rendering, image based environment augmentation, simulation, and image registration, to name but a few.
In a simple application, signals in the form of stereo images are acquired of a scene by a pair of cameras. Features common to both images are identified. The features are used to determine the extrinsic parameters (pose) of the cameras. In that application, the scene is localized, and the signals have a high degree of overlap. This results in a large set of common features. When two cameras take two overlapping images, it is simple to determine the relative poses of the cameras, up to scale and orientation.
The invention is concerned with determining the poses of sensors widely distributed in the environment. For example, signals in the form of images are acquired by security cameras at various locations in a city, or by a tourist wandering through the city, or by mobile telephone cameras as users move about in the city. In another application, the signals are acquired from a portion of the universe with radio telescopes that are thousands of kilometers apart. Arrays of microphones or seismographs scattered over the globe are other examples of large scale sensor networks.
Accordingly, the signals to be processed by the invention have two primary characteristics: the environment in which the signals are acquired is large; and the set signals are said to be sparse. Although some features in one signal overlap, in some part, with features in another signal, most of the signals have very little in common.
Numerous techniques are known for determining the poses of sensors from acquired signals. Of special interest are methods that decouple the three translational DOFs (locations) from the three rotational DOFs (orientation). In such methods, the degrees of freedom are typically factored to reduce the number of parameters that are estimated simultaneously. Both interactive and automated methods are known. Interactive methods do not scale effectively and are vulnerable to operator error and numerical instability. Therefore, automated methods are preferred.
Projective techniques can also be used to recover structure and pose, but only up to an arbitrary projective transformation. Other structure-from-motion methods use singular value decompositions or random searches. However, most prior art methods require a large set of corresponding features.
Antone et al. describe a method for recovering corresponding features (correspondences) from a sparse a set of images in “Scalable extrinsic calibration of omni-directional image networks,” IJCV 49, pp. 143–174, 2002. The correspondences can be found by a random search, or from rough indicators of co-location, e.g., image histogram matches, edge detection, assignments to wireless communication cells, or GPS. Antone et al. also describe how to determine a global orientation of the set of partially overlapping images by analyzing the sparse correspondences.