It is often desired to determine one's location, for example when driving. One way to do this is to use a global positioning system (GPS). Unfortunately, GPS has limitations because the signals are broadcasted at 500 watts from satellites about 12,000 miles up. Signals from four satellites are required for normal operation. The signals can be obstructed by buildings, and even foliage. This is called the urban canyon problem. In mobile automotive GPS receivers, metallic features in windshields, such as defrosters, or window tinting films can act as a Faraday cage, further degrading reception.
Therefore, it is desired to use computer vision techniques. In computer vision applications, images are analyzed to determine poses, i.e., location and orientation. Pose estimation, although extremely challenging due to several degeneracy problems and inaccurate feature matching, is well known in computer vision. However, most conventional solutions are only proven on a small scale, typically in a well controlled environment.
The following method are known for inferring geolocation from images, Hays et al., “Im2gps: estimating geographic images,” CVPR, 2008, Robertson et al., “An image-based system for urban navigation,” BMVC, 2004, Yeh et al., “Searching the web with mobile images for location recognition,” CVPR, 2004, and Zhang et al., “Image based localization in urban environments,” 3DPVT, 2006.
Another system method uses an infrared camera and a 3D model generated from an aerial laser survey, Meguro et al., “Development of positioning technique using omni-directional IR camera and aerial survey data,” Advanced Intelligent Mechatronics, 2007.
That system requires an expensive infrared camera, which makes it impractical for large-scale deployment in consumer oriented applications, such as vehicle or hand-held devices. Their camera is not an omni-directional camera. To provide a partial 360° view primary and secondary mirrors are placed directly in the optical path between the scene and camera. The mirrors obscure a large central portion of the infrared images.
The method requires a high-resolution 3D digital surface model (DSM) is used to construct “restoration images.” The DSM is represented in a global geographic coordinate system converted into an Earth Centered Earth Fixed (the ECEF) Cartesian coordinates, and then into East North Up (ENU) coordinates in which the survey position is the origin.
The method unwraps the infrared images into a rectangular panorama, from which edges are extracted to generate a linear profile of the surrounding buildings. They use an azimuth projection specialized for their camera design. The profile is then correlated with the profiles in the restoration images. Neither the unwrapped infrared profiles nor the profiles in the restoration images directly reflect the actual skyline. As a result, their approach can lead to potential inaccuracies when the camera is not vertically aligned. Their angle of view is restricted to 20˜70 degrees, and is limited to only provide a 2D location and 1D orientation (x, y, θ).
In addition, they require accurate intracamera parameters and an accurate method of projection. With the IR camera, it is necessary to have infrared rays emitted in certain patterns. That makes it a challenge to determine a technique for capturing images of cyclical patterns used in camera calibration. Hence, a highly specialized calibration jig with thermal point sources arranged inside is required, which further stands in the way of mass deployment in a consumer market.