1. Field of the Invention
This invention generally relates to visual odometry and, more particularly, to a system and method for using visual odometry to autonomously navigate a vehicle.
2. Description of the Related Art
In navigation, odometry is the use of data from the movement of actuators to estimate change in position over time through devices such as rotary encoders to measure wheel rotations. Visual odometry is the process of determining equivalent odometry information using sequential camera images to estimate the distance traveled. Many existing approaches to visual odometry are based on the following stages:
1) Acquire input images: using either single, stereo, or omnidirectional cameras.
2) Image correction: apply image processing techniques for lens distortion removal, etc.
3) Feature detection: define interest operators, and match features across frames and construct optical flow field.
4) Use correlation to establish correspondence of two images, extract features, and correlate.
5) Check flow field vectors for potential tracking errors and remove outliers.
6) Estimation of the camera motion from the optical flow.
7) Find the geometric and 3D properties of the features that minimize a cost function based on the re-projection error between two adjacent images.
8) Periodic repopulation of trackpoints to maintain coverage across the image.
Egomotion is defined as the 3D motion of a camera within an environment. In the field of computer vision, egomotion refers to estimating a camera's motion relative to a rigid scene. An example of egomotion estimation would be estimating a car's moving position relative to lines on the road or street signs being observed from the car itself. The goal of estimating the egomotion of a camera is to determine the 3D motion of that camera within the environment using a sequence of images taken by the camera. The process of estimating a camera's motion within an environment involves the use of visual odometry techniques on a sequence of images captured by the moving camera. As noted above, this may be done using feature detection to construct an optical flow from two image frames in a sequence generated from either single cameras or stereo cameras. Using stereo image pairs for each frame helps reduce error and provides additional depth and scale information.
In conventional stereo vision, two cameras, displaced horizontally from one another, are used to obtain two differing views of a scene, in a manner similar to human binocular vision. By comparing these two images, the relative depth information can be obtained, in the form of disparities, which are inversely proportional to the differences in distance to the objects. In conventional camera systems several pre-processing steps are required.
1) The image must first be removed of distortions, such as barrel distortion to ensure that the observed image is purely projectional.
2) The image must be projected back to a common plane to allow comparison of the image pairs, known as image rectification.
3) An information measure which compares the two images is minimized. This gives the best estimate of the position of features in the two images, and creates a disparity map.
4) Optionally, the disparity as observed by the common projection is converted back to the depth map by inversion. Utilizing the correct proportionality constant, the depth map can be calibrated to provide exact distances.
Stereo vision uses triangulation based on epipolar geometry to determine the distance to an object. More specifically, binocular disparity is the process of relating the depth of an object to its change in position when viewed from a different camera, given the relative position of each camera is known. With multiple cameras it can be difficult to find a corresponding point viewed by one camera in the image of the other camera. In most camera configurations, finding correspondences requires a search in two-dimensions. However, if the two cameras are aligned correctly to be coplanar, the search is simplified to one dimension—a horizontal line parallel to the line between the cameras. Furthermore, if the location of a point in the left image is known, it can be searched for in the right image by searching left of this location along the line, and vice versa.
The disparity of features between two stereo images is usually computed as a shift to the left of an image feature when viewed in the right image. For example, a single point that appears at the x coordinate t (measured in pixels) in the left image may be present at the x coordinate t−3 in the right image. In this case, the disparity at that location in the right image would be 3 pixels.
Knowledge of disparity can be used in further extraction of information from stereo images. One case in which disparity is most useful is for depth/distance calculation. Disparity and distance from the cameras are inversely related. As the distance from the cameras increases, the disparity decreases. This allows for depth perception in stereo images. Using geometry and algebra, the points that appear in the 2D stereo images can be mapped as coordinates in 3D space. This concept is particularly useful for navigation.
In mathematics, a rigid transformation (isometry) of a vector space preserves distances between every pair of points. Rigid transformations of a plane R2, space R3, or real n-dimensional space Rn are termed a Euclidean transformation because they form the basis of Euclidean geometry. The rigid transformations include rotations, translations, reflections, or their combination. In general, any proper rigid transformation can be decomposed as a rotation followed by a translation. Objects keep the same shape and size after a proper rigid transformation.
In Euclidean geometry, a translation is a function that moves every point a constant distance in a specified direction. A translation can be described as a rigid motion: other rigid motions include rotations and reflections. A translation can also be interpreted as the addition of a constant vector to every point, or as shifting the origin of the coordinate system. A rotation is a motion of a certain space that preserves at least one point. It can describe, for example, the motion of a rigid body around a fixed point. A rotation is different from a translation, which has no fixed points.
Currently, most autonomous robot navigation systems are implemented based on high end sensors, such as a laser scanner, high accuracy GPS receiver, or orientation sensor (inertia measurement unit (IMU)). These sensors add to the cost of the robots, making them unaffordable for household applications.
It would be advantageous if a low cost alternative to robot auto navigation existing using visual odometry, to aid with obstacle avoidance.