1. Field
The disclosed embodiments relate to a navigation method and apparatus and, more particularly, to a method and apparatus for combining machine vision and inertial measurements to provide accurate motion tracking, localization, and navigation.
2. Brief Description of Earlier Developments
A number of applications require precise motion tracking. For example, autonomous robot operation in an unconstrained environment requires that the robot keep track of its own position and attitude to be able to map the space and navigate through it. Coordination of an autonomous team of robots requires that the robots keep track of their own positions as well as the other robots' positions. Many applications use the Global Positioning System (GPS) to track the motion of vehicles. However, GPS is not available indoors or under heavy foliage. Furthermore, low-cost GPS units cannot accurately determine the attitude of the vehicle. Lunar and planetary exploration, either by humans or robots, also requires precise localization and navigation. Unless an infrastructure of beacons and/or navigation satellites has been established, these explorers must rely on self-contained navigation systems.
Several application areas require accurate tracking of a human or vehicle throughout a large space. An example is motion capture of both actors and mobile cameras for motion picture production. As a further example, on the Spirit and Opportunity Mars Exploration Rovers (MERs), Visual Odometry (VO) has been used to navigate in situations where wheel slip rendered wheel odometry useless for navigation. By monitoring wheel slip, VO played a key role in mission safety by preventing the rovers from bogging down or sliding down a slope. The MERs use a form of visual odometry based on tracking features in a sequence of stereo images. Because of limited onboard computing power, a number of approximations were required to produce an algorithm that could run in a reasonable amount of time. Even so, each complete VO cycle requires 2 to 3 seconds to complete on the MER's 20 MHz RAD6000 CPU. When VO is used for navigation, the rover drive speed must be reduced by an order of magnitude because VO requires that the sequence of images must overlap a significant amount. Therefore, the MERs use VO only when wheel odometry was expected to be inaccurate due to wheel slip. Like most so-called optical flow calculations, MER VO is really implemented by tracking a relatively small number of features. A Harris corner detector operator is applied to each stereo image pair. To reduce the computations, a grid of cells is superimposed on the left image. In each cell, the feature with the strongest corner response is selected to be tracked. Pseudo-normalized correlation is used to determine the disparity of the feature locations in the two stereo images. The 3-D locations of the features are determined by projecting rays through the two camera models. When the next stereo pair is acquired, wheel odometry and the 3-D locations of the features are used to project the features into the new stereo pair. A correlation search then establishes new 2D locations of the features in each image and new 3-D locations are computed by stereo matching. Motion estimation is done by embedding a least-squares estimation algorithm within a Random Sample Consensus (RANSAC) algorithm. This approach was mandated by the relatively long time intervals between image acquisitions. Several VO algorithms have been reported in the literature. Optical computing techniques have been used to produce an experimental device that determines optical flow at a relatively small number of locations in the image.
Other methods have been based on feature tracking. Nister, et. al. described a VO algorithm formulated to identify features by Harris corner detection, track them using normalized correlation, and determine ego motion by minimizing image reprojection error using a preemptive RANSAC algorithm. They have reported position errors of 1-2% over outdoor courses up to 365 meters. However, they did not measure attitude errors. Campbell, et. al. described a VO system using COTS hardware and software from the OpenCV library. As with the other algorithms, features were identified using the Harris corner algorithm. Then an efficient form of the Lucas-Kadane algorithm, available in the OpenCV library, was applied to compute the optical flow. The ego motion was estimated based on the assumption that the lower portion of the image was “ground” and the upper portion was “sky.” The angular motion was estimated from the sky portion and the linear motion was estimated from the ground portion assuming features were predominately on the same ground plane. This worked in the examples they provided because the “sky” contained distant objects that could provide an angular reference. Konolige et. al. have developed an accurate VO algorithm using multi-scale feature tracking, bundle adjustment, and IMU integration. They quote an accuracy of 0.1% over a 9 km trajectory on a vehicle running at 5 m/s. This method differs from the Bayesian VO proposed here As described further herein. A research group at INRIA, Sophia-Antipolis, France has developed what they call a visual SLAM algorithm. As with Bayesian VO, this algorithm estimates the camera location directly from changes in intensities between images. However, the visual SLAM algorithm uses a second-order optimization procedure and does not quantify uncertainties. Recently, there has been considerable interest in combining vision and inertial systems. However, none of this research has explored a fully Bayesian approach. Accordingly, there is a need for an efficient automated means of precisely tracking location and attitude.